fix(database): disambiguate same-name media with comprehensive tag mapping#1036
Conversation
…pping Same-name media (e.g. three "Jackal" or "Finalizer" arcade entries) rendered identically because their distinguishing filename tokens were unparsed or not eligible for disambiguation. - Map every distinguishing filename token to a correctly-typed known tag: Sega/Capcom/Irem arcade boards, input devices, memory/compatibility, dump flags, protection chips, cabinet orientation, build dates, and more. - Add cabinet (upright/cocktail/cabaret/sitdown) and protection (fd1094/fd1089/8751/mc-8123/encrypted/decrypted) tag types. - Recompute DisambiguationTypes over the full allowlist of eligible tag types so presence/absence and value differences across siblings both qualify. - Rewrite the disambiguation recompute as a single-pass set-based update (reset then set-qualifying via CTEs), parity-verified against the prior correlated-subquery form; add benchmark coverage. - Resolve companion gamelist slug conflicts by picking the name-consistent parent, fixing Phantasy Star IV showing Phantasy Star III metadata. - Map spelled-out "(System 16C version)" board ids that were dropped as noise.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR updates disambiguation ordering and recomputation, adds companion child slug conflict handling, and expands filename tag parsing with new arcade, music, and format tag types and mappings. ChangesDisambiguation, scraper, and tag parsing
Estimated code review effort: 4 (Complex) | ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
The scanner status callback set indexing=true on every progress update, so a callback firing after cancel() cleared the flag (but before the scanner observed the cancelled context) briefly resurrected the running state. Skip status updates once the context is cancelled. Fixes flaky TestMediaIndexingCancellation_Integration.
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (1)
pkg/database/scraper/gamelistxml/scraper_test.go (1)
2611-2636: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse afero for the gamelist fixture write.
This test writes directly with
os.WriteFile; route the fixture through the scraper’s filesystem seam instead.♻️ Proposed fix
root := t.TempDir() + fs := afero.NewOsFs() // The same "phantasystar4" slug is listed under two parents. The WRONG parent // (Phantasy Star 3) is listed first to prove file order does not decide the winner; // name consistency must route the slug to the Phantasy Star 4 parent (id 30). - require.NoError(t, os.WriteFile(filepath.Join(root, "gamelist.xml"), []byte(`<gameList> + require.NoError(t, afero.WriteFile(fs, filepath.Join(root, "gamelist.xml"), []byte(`<gameList> ... </gameList>`), 0o600)) @@ - s := &GamelistXMLScraper{db: mockDB} + s := &GamelistXMLScraper{db: mockDB, fs: fs}As per coding guidelines,
**/*.go: “Use afero for filesystem operations in testable code.”🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/scraper/gamelistxml/scraper_test.go` around lines 2611 - 2636, The test in GamelistXMLScraper is writing the gamelist fixture directly with os.WriteFile instead of using the filesystem abstraction. Update this fixture setup to use the scraper’s afero-backed filesystem seam (the same test FS used by GamelistXMLScraper) so the test stays consistent with the rest of the filesystem-dependent code and guidelines.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/database/mediadb/disambiguation_bench_test.go`:
- Around line 85-96: The benchmark setup in disambiguation_bench_test.go is
interpolating titles into the SQL passed to conn.ExecContext, which triggers
SQL-injection checks even though the value is local. Update the recursive seed
inserts in the ExecContext call to use parameter placeholders instead of
fmt.Sprintf formatting, and pass titles as query arguments while keeping the
MediaTitles and Media insert behavior unchanged. Use the conn.ExecContext call
in the benchmark helper to locate and adjust the query string and arguments.
In `@pkg/database/mediadb/sql_media_titles.go`:
- Around line 377-382: The recompute logic in sql_media_titles.go currently
relies on db.sqlMu but still performs a reset and a set as separate statements,
so readers like attachZapScriptTags and GetZapScriptTagsBySystemAndPath can
observe cleared DisambiguationTypes between them. Update the recompute path to
use a transaction around each chunk, or refactor it into a single statement, and
keep the change localized around the recompute helper that runs the reset/set
sequence.
In `@pkg/database/mediascanner/indexing_pipeline.go`:
- Around line 344-345: Add a focused test for the `track:` dynamic tag path in
`AddMediaPathWithPrefixPolicy` or the nearest indexing test helper. Create a
case that feeds a filename tag starting with `track:` and assert the resulting
`tags.TagTypeTrack` tag is persisted with the expected value and linked to the
media item. Use the existing indexing pipeline symbols around `tagStr`,
`dynType`, and `tags.TagTypeTrack` so the test covers this branch directly and
guards against regressions in dynamic tag creation.
In `@pkg/database/scraper/gamelistxml/scraper.go`:
- Around line 1848-1857: The tie handling in the winner selection logic should
not treat every best-score tie as a valid unmanaged equal-name case. In the
resolution block that computes winner/winnerCount from scores, update the
ambiguity check so ties at score 1 between different parent slugs are discarded
instead of being left unresolved, and only assign conflicts[stem] when there is
a single unambiguous winner. Add a test covering the scraper resolution path
with two distinct parent names that both produce score 1 for the same child slug
to confirm the ambiguous tie is dropped.
In `@pkg/database/tags/filename_parser_test.go`:
- Around line 1467-1509: Add assertions in
TestParseFilenameToCanonicalTagsForMedia_CabinetAndProtection for the newly
introduced mappings so those branches are covered too: exercise
ParseFilenameToCanonicalTagsForMedia with representative filenames that should
produce cabaret, encrypted, decrypted, mc-8123, and lnx tags, and verify the
resulting CanonicalTag.String values contain each expected tag. Keep the
existing test structure and use the same helper patterns in
filename_parser_test.go so the new cases are easy to locate and maintain.
---
Nitpick comments:
In `@pkg/database/scraper/gamelistxml/scraper_test.go`:
- Around line 2611-2636: The test in GamelistXMLScraper is writing the gamelist
fixture directly with os.WriteFile instead of using the filesystem abstraction.
Update this fixture setup to use the scraper’s afero-backed filesystem seam (the
same test FS used by GamelistXMLScraper) so the test stays consistent with the
rest of the filesystem-dependent code and guidelines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 6d8b8b12-ed88-488d-a28d-2247702afbe6
📒 Files selected for processing (14)
pkg/database/database.gopkg/database/mediadb/disambiguation_bench_test.gopkg/database/mediadb/disambiguation_test.gopkg/database/mediadb/sql_media_titles.gopkg/database/mediascanner/indexing_pipeline.gopkg/database/scraper/gamelistxml/scraper.gopkg/database/scraper/gamelistxml/scraper_test.gopkg/database/tags/filename_parser.gopkg/database/tags/filename_parser_test.gopkg/database/tags/string_parsers.gopkg/database/tags/tag_mappings.gopkg/database/tags/tag_values.gopkg/database/tags/tags.gopkg/database/tags/tagtypes.go
| _, err = conn.ExecContext(ctx, fmt.Sprintf(` | ||
| WITH RECURSIVE seq(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM seq WHERE i < %d) | ||
| INSERT INTO MediaTitles (DBID, SystemDBID, Slug, Name) | ||
| SELECT i, 1, 'game-' || i, 'Game ' || i FROM seq; | ||
| WITH RECURSIVE seq(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM seq WHERE i < %d) | ||
| INSERT INTO Media (DBID, MediaTitleDBID, SystemDBID, Path, IsMissing) | ||
| SELECT (i-1)*3 + j, i, 1, 'game-' || i || '-' || j, 0 | ||
| FROM seq, (SELECT 1 AS j UNION SELECT 2 UNION SELECT 3); | ||
| INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 1 FROM Media; | ||
| INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 2 FROM Media WHERE (DBID %% 3) = 2; | ||
| INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 3 FROM Media WHERE (DBID %% 3) = 0; | ||
| `, titles, titles)) |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Parameterize the recursive seed query.
titles is local here, but interpolating it into ExecContext is already tripping SQL-injection rules on this hunk. Swapping the two %d slots for ? placeholders keeps the benchmark behavior the same and avoids normalizing a pattern the repo's security checks are supposed to catch. As per coding guidelines, never disable security checks (gosec, govulncheck) or change the forbidigo rules for sync.Mutex/RWMutex.
Suggested change
- _, err = conn.ExecContext(ctx, fmt.Sprintf(`
- WITH RECURSIVE seq(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM seq WHERE i < %d)
+ _, err = conn.ExecContext(ctx, `
+ WITH RECURSIVE seq(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM seq WHERE i < ?)
INSERT INTO MediaTitles (DBID, SystemDBID, Slug, Name)
SELECT i, 1, 'game-' || i, 'Game ' || i FROM seq;
- WITH RECURSIVE seq(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM seq WHERE i < %d)
+ WITH RECURSIVE seq(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM seq WHERE i < ?)
INSERT INTO Media (DBID, MediaTitleDBID, SystemDBID, Path, IsMissing)
SELECT (i-1)*3 + j, i, 1, 'game-' || i || '-' || j, 0
FROM seq, (SELECT 1 AS j UNION SELECT 2 UNION SELECT 3);
INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 1 FROM Media;
INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 2 FROM Media WHERE (DBID %% 3) = 2;
INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 3 FROM Media WHERE (DBID %% 3) = 0;
- `, titles, titles))
+ `, titles, titles)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _, err = conn.ExecContext(ctx, fmt.Sprintf(` | |
| WITH RECURSIVE seq(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM seq WHERE i < %d) | |
| INSERT INTO MediaTitles (DBID, SystemDBID, Slug, Name) | |
| SELECT i, 1, 'game-' || i, 'Game ' || i FROM seq; | |
| WITH RECURSIVE seq(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM seq WHERE i < %d) | |
| INSERT INTO Media (DBID, MediaTitleDBID, SystemDBID, Path, IsMissing) | |
| SELECT (i-1)*3 + j, i, 1, 'game-' || i || '-' || j, 0 | |
| FROM seq, (SELECT 1 AS j UNION SELECT 2 UNION SELECT 3); | |
| INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 1 FROM Media; | |
| INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 2 FROM Media WHERE (DBID %% 3) = 2; | |
| INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 3 FROM Media WHERE (DBID %% 3) = 0; | |
| `, titles, titles)) | |
| _, err = conn.ExecContext(ctx, ` | |
| WITH RECURSIVE seq(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM seq WHERE i < ?) | |
| INSERT INTO MediaTitles (DBID, SystemDBID, Slug, Name) | |
| SELECT i, 1, 'game-' || i, 'Game ' || i FROM seq; | |
| WITH RECURSIVE seq(i) AS (SELECT 1 UNION ALL SELECT i+1 FROM seq WHERE i < ?) | |
| INSERT INTO Media (DBID, MediaTitleDBID, SystemDBID, Path, IsMissing) | |
| SELECT (i-1)*3 + j, i, 1, 'game-' || i || '-' || j, 0 | |
| FROM seq, (SELECT 1 AS j UNION SELECT 2 UNION SELECT 3); | |
| INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 1 FROM Media; | |
| INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 2 FROM Media WHERE (DBID %% 3) = 2; | |
| INSERT INTO MediaTags (MediaDBID, TagDBID) SELECT DBID, 3 FROM Media WHERE (DBID %% 3) = 0; | |
| `, titles, titles) |
🧰 Tools
🪛 OpenGrep (1.23.0)
[ERROR] 85-96: SQL query built via fmt.Sprintf or string concatenation passed to a database method. Use parameterized queries with placeholder arguments.
(coderabbit.sql-injection.go-query-format)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/database/mediadb/disambiguation_bench_test.go` around lines 85 - 96, The
benchmark setup in disambiguation_bench_test.go is interpolating titles into the
SQL passed to conn.ExecContext, which triggers SQL-injection checks even though
the value is local. Update the recursive seed inserts in the ExecContext call to
use parameter placeholders instead of fmt.Sprintf formatting, and pass titles as
query arguments while keeping the MediaTitles and Media insert behavior
unchanged. Use the conn.ExecContext call in the benchmark helper to locate and
adjust the query string and arguments.
Sources: Coding guidelines, Linters/SAST tools
| case strings.HasPrefix(tagStr, string(tags.TagTypeTrack)+":"): | ||
| dynType = string(tags.TagTypeTrack) |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Add a focused test for track: indexing.
This branch adds new persisted behavior, but the new tests in this cohort only exercise recompute logic. A small AddMediaPathWithPrefixPolicy case that feeds a track:* filename tag and asserts the track tag type/value is created and linked would catch regressions in the dynamic tag path. As per coding guidelines, **/*.go: Write tests for all new code — see TESTING.md and pkg/testing/README.md.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/database/mediascanner/indexing_pipeline.go` around lines 344 - 345, Add a
focused test for the `track:` dynamic tag path in `AddMediaPathWithPrefixPolicy`
or the nearest indexing test helper. Create a case that feeds a filename tag
starting with `track:` and assert the resulting `tags.TagTypeTrack` tag is
persisted with the expected value and linked to the media item. Use the existing
indexing pipeline symbols around `tagStr`, `dynType`, and `tags.TagTypeTrack` so
the test covers this branch directly and guards against regressions in dynamic
tag creation.
Source: Coding guidelines
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
… handling - Collapse the disambiguation recompute reset+set into one UPDATE...FROM so GetZapScriptTagsBySystemAndPath, which reads DisambiguationTypes without sqlMu, cannot observe the transient blank state between statements under the single-connection pool. - Drop companion slug conflicts that tie on prefix-only (score 1) name matches: differently-named parents do not share metadata. Exact-name (score 2) ties stay first-wins. - Add tests for the track dynamic tag type and the cabaret/encrypted/decrypted/ mc-8123/lnx mappings.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/database/mediadb/sql_media_titles.go (1)
423-470: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winMake the
group_concatorder explicit
group_concat(tag)andgroup_concat(typ, ',')still depend on subquery order here; use SQLite’s ordered-aggregate syntax sovsandDisambiguationTypesstay stable.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/mediadb/sql_media_titles.go` around lines 423 - 470, The `setQuery` CTEs still rely on subquery ordering for `group_concat`, so the values in `mvs` (`group_concat(tag) AS vs`) and `grp` (`group_concat(typ, ',') AS types`) can vary across runs. Update the SQL in `sql_media_titles.go` to use SQLite’s ordered-aggregate form inside the `mvs` and `grp` queries so the concatenation order is explicit and `DisambiguationTypes` remains deterministic.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/database/mediadb/sql_media_titles.go`:
- Around line 423-470: The `setQuery` CTEs still rely on subquery ordering for
`group_concat`, so the values in `mvs` (`group_concat(tag) AS vs`) and `grp`
(`group_concat(typ, ',') AS types`) can vary across runs. Update the SQL in
`sql_media_titles.go` to use SQLite’s ordered-aggregate form inside the `mvs`
and `grp` queries so the concatenation order is explicit and
`DisambiguationTypes` remains deterministic.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: c5ec5ac9-59fc-4474-a55f-719211cbee5e
📒 Files selected for processing (6)
pkg/api/methods/media.gopkg/database/mediadb/sql_media_titles.gopkg/database/mediascanner/indexing_pipeline_test.gopkg/database/scraper/gamelistxml/scraper.gopkg/database/scraper/gamelistxml/scraper_test.gopkg/database/tags/filename_parser_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- pkg/database/scraper/gamelistxml/scraper.go
- pkg/database/scraper/gamelistxml/scraper_test.go
- pkg/database/tags/filename_parser_test.go
Use SQLite ordered-aggregate group_concat in the recompute CTEs so the concatenation order no longer depends on subquery ORDER BY propagating into the aggregate (which SQLite does not guarantee). mvs.vs orders by tag, keeping COUNT(DISTINCT vs) from treating an identical tag set as distinct; grp.types orders by type so the stored DisambiguationTypes string stays deterministic.
Summary by CodeRabbit
YYYY-MMbuild-date shape, and additional compatibility/distribution aliases.track:*tags (includingtrack:-prefixed filename tags) are correctly persisted after indexing.