fix(database): auto-recover browse cache from stale or wedged state#1040
fix(database): auto-recover browse cache from stale or wedged state#1040wizzomafizzo wants to merge 7 commits into
Conversation
Large libraries (~1M+ media rows) could hit permanent media.browse timeouts: a reindex invalidates the browse cache to BrowseIndexVersion "0" immediately but only rebuilds it at the end of a multi-hour index, and an interrupted index leaves IndexingStatus="running" that auto- resumes every boot and never completes. With no valid cache, every system-root browse fell back to a full media scan that exceeds the API request timeout on cold SD storage. - Serve a present-but-version-stale browse cache instead of falling back to the slow media scan; distinguish fresh / stale-present / absent cache states. - Add a standalone background browse-cache self-heal on startup (checkAndHealBrowseCache) that rebuilds the cache without requiring a full media reindex, surfaced to clients via an optimizing notification. - Wrap the self-heal media scan in context.WithoutCancel to avoid the mattn/go-sqlite3 per-row-goroutine path on the 1M-row read; serialize it after the startup slug-cache rebuild so they don't saturate the 2-connection pool. - Bound the wedged-reindex auto-resume loop (maxIndexResumeAttempts) and degrade the absent-cache route-count fallback with per-route sub- timeouts and a presence probe instead of erroring the whole browse. - Recompute disambiguation only for titles touched this run and rebuild the tags cache only for dirty systems, instead of doing both for every system every index. - Sample (LIMIT) the browse prefix-policy read instead of loading every path under a directory. - Reset IndexResumeAttempts when an index completes so earlier interruptions don't trip the resume cap after a later successful run.
📝 WalkthroughWalkthroughThis PR updates browse-route counting to return degraded counts, adds browse-cache freshness tracking and self-healing, limits automatic index resume attempts, and scopes indexing recomputation using touched-title and missing-media tracking. ChangesMedia browse, cache, resume, and indexing state
Estimated code review effort: 5 (Critical) | ~120 minutes Possibly related issues
Possibly related PRs
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/database/mediadb/concurrent_operations_test.go (1)
432-446: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick winUpdate the optimization mock sequence in
pkg/database/mediadb/concurrent_operations_test.go:417Add
expectOptimizationResumeRead(mock),expectTemporaryParentDirRepairStepNoop(mock), andexpectBrowseCacheStep(mock)beforeexpectAnalyzeStep(mock)soTestRaceConditionBetweenStatusAndOptimizationmatches the current background-optimization flow.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/mediadb/concurrent_operations_test.go` around lines 432 - 446, The optimization mock sequence in TestRaceConditionBetweenStatusAndOptimization is missing steps from the current background-optimization flow. Update the expectations around the existing expectAnalyzeStep and expectPostAnalyzeSteps calls to include expectOptimizationResumeRead(mock), expectTemporaryParentDirRepairStepNoop(mock), and expectBrowseCacheStep(mock) before analyze. Keep the DBConfig status expectations aligned with the revised sequence so the mocked exec order matches the actual optimization path.
🧹 Nitpick comments (4)
pkg/database/mediadb/sql_test.go (1)
1646-1654: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd coverage for
IsMissing = true.Every updated mock row across these tests hardcodes
IsMissingasfalse, and no assertion checksresults[i].IsMissing. A regression in the new scan target (e.g., wrong scan-arg order) wouldn't be caught by this suite.✅ Example addition
rows := sqlmock.NewRows(cols). AddRow(int64(1), marioPath, gamesDir, int64(10), "Super Mario Bros.", false). - AddRow(int64(2), zeldaPath, gamesDir, int64(11), "The Legend of Zelda", false). + AddRow(int64(2), zeldaPath, gamesDir, int64(11), "The Legend of Zelda", true). AddRow(int64(3), metroidPath, gamesDir, int64(12), "Metroid", false) ... + assert.True(t, results[1].IsMissing) + assert.False(t, results[0].IsMissing)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/mediadb/sql_test.go` around lines 1646 - 1654, The sql_test coverage for the media DB scan path only uses rows with IsMissing set to false, so add a case in the relevant test setup around the rows.NewRows/AddRow fixture and the results assertions to exercise IsMissing = true as well. Use the existing sqlmock.NewRows construction in sql_test.go and assert results[i].IsMissing explicitly so the scan target/order is validated for both false and true values.pkg/database/mediascanner/mediascanner.go (1)
823-824: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick winRelease the new scan-state maps before cache builds.
The cleanup block later drops old scan-state maps to reclaim large backing arrays, but these newly added maps are not included. Large systems can retain
TouchedTitles/PreviouslyMissingbuckets through secondary-index and cache rebuild phases.♻️ Proposed fix
scanState.TagIDs = nil scanState.MissingMedia = nil +scanState.PreviouslyMissing = nil +scanState.TouchedTitles = nil status.Phase = PhaseCreatingIndexes🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/mediascanner/mediascanner.go` around lines 823 - 824, The scan-state cleanup is missing the newly added PreviouslyMissing and TouchedTitles maps, so their backing arrays can survive through rebuild phases. Update the cleanup/release block in mediascanner.go to drop these maps alongside the existing scan-state maps, using the same pattern already applied for other temporary state in the mediascanner logic. Make sure the change is made near the scan-state setup and cleanup paths that manage secondary-index and cache rebuilds.pkg/database/mediadb/sql_config.go (1)
139-167: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winInconsistent parameter type:
sqlQueryablevs.*sql.DB.
sqlSetIndexResumeAttempts(and other new helpers likesqlBrowseCacheStatusinsql_browse.go) takedb sqlQueryable, butsqlGetIndexResumeAttemptstakes the concretedb *sql.DB. This is inconsistent with the established pattern in this package and prevents testing this getter with the lightweightfakeSQLQueryableused elsewhere, forcing heaviersqlmocksetups instead.♻️ Suggested fix
-func sqlGetIndexResumeAttempts(ctx context.Context, db *sql.DB) (int, error) { +func sqlGetIndexResumeAttempts(ctx context.Context, db sqlQueryable) (int, error) {🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/mediadb/sql_config.go` around lines 139 - 167, `sqlGetIndexResumeAttempts` is inconsistent with the other database helpers because it takes `*sql.DB` instead of the shared `sqlQueryable` interface. Update this getter to accept `db sqlQueryable` like `sqlSetIndexResumeAttempts` and the other helpers (for example `sqlBrowseCacheStatus`), then keep its `QueryRowContext` logic the same so it can be tested with `fakeSQLQueryable` without requiring `sqlmock`.pkg/database/mediadb/sql_browse.go (1)
312-330: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winMove
browseCacheStatetype/consts near the top of the file.This new type and its consts are introduced ~300 lines into the file rather than near the top before functions/methods.
♻️ Suggested placement
-// (leave definitions in place at line 312) +// Relocate the `browseCacheState` type, its consts, and any other new +// package-level declarations introduced by this change to the top of the +// file, grouped with existing types/consts.As per coding guidelines, "Define Go types and consts near the top of the file, before functions and methods."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/mediadb/sql_browse.go` around lines 312 - 330, Move the browse cache state declarations closer to the top of sql_browse.go, before the function and method implementations. Relocate the browseCacheState type and its const block (browseCacheAbsent, browseCacheStale, browseCacheFresh) near the file’s other top-level declarations so they are defined alongside the rest of the types/constants used by browse-related code.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/database/mediadb/sql_browse.go`:
- Around line 1660-1720: `browseRouteCounts` currently adds degraded
`BrowseRouteCount` entries with `CountUnknown` but leaves `SystemIDs` empty,
which prevents `media.browse` from resolving the root’s system. Update the
degraded-route path in `sqlBrowseRouteHasMedia`/the surrounding browse-count
logic to carry a route-specific system signal into the returned count, and
populate `SystemIDs` from that route’s matched system rather than from
`opts.Systems`; keep the handling localized to the route-processing code so
shared paths are not misattributed.
In `@pkg/database/mediascanner/mediascanner.go`:
- Around line 1477-1479: The cache rebuild path in mediascanner’s selective
re-index flow is using only dirty systems, but UpdateLastGenerated clears
SystemTagsCache for all affected systems first, so preserved systems can be left
without cache rows. Update the PopulateSystemTagsCacheForSystems call in the
re-index logic to repopulate the full relevant system scope rather than
dirtySystemDefs only, so unchanged selective re-indexes still restore preserved
SystemTagsCache entries before RebuildTagCache runs.
In `@pkg/service/index_resume.go`:
- Around line 289-355: The background browse-cache self-heal in
checkAndHealBrowseCache can resume after pauser.Wait and call
PopulateBrowseCache even if a new reindex or optimization started while it was
paused. Add a second state check immediately after pauser.Wait returns, reusing
methods.IsIndexing() and db.MediaDB.GetOptimizationStatus(), and bail out if
indexing is now active or the optimization status is running before proceeding
to notifications.MediaIndexing or PopulateBrowseCache.
- Around line 86-114: The resume-attempt budget in index_resume.go currently
fails open when GetIndexResumeAttempts() or IncrementIndexResumeAttempts()
returns an error, which can let auto-resume continue indefinitely. Update the
logic around db.MediaDB.GetIndexResumeAttempts and
db.MediaDB.IncrementIndexResumeAttempts so that any counter read/write failure
is treated as a closed failure path: log the error and skip resuming indexing
instead of defaulting attempts to zero or proceeding without a successful
increment. Keep the existing maxIndexResumeAttempts and cancellation handling in
place, but ensure the safety cap cannot be bypassed by persistent storage
errors.
---
Outside diff comments:
In `@pkg/database/mediadb/concurrent_operations_test.go`:
- Around line 432-446: The optimization mock sequence in
TestRaceConditionBetweenStatusAndOptimization is missing steps from the current
background-optimization flow. Update the expectations around the existing
expectAnalyzeStep and expectPostAnalyzeSteps calls to include
expectOptimizationResumeRead(mock),
expectTemporaryParentDirRepairStepNoop(mock), and expectBrowseCacheStep(mock)
before analyze. Keep the DBConfig status expectations aligned with the revised
sequence so the mocked exec order matches the actual optimization path.
---
Nitpick comments:
In `@pkg/database/mediadb/sql_browse.go`:
- Around line 312-330: Move the browse cache state declarations closer to the
top of sql_browse.go, before the function and method implementations. Relocate
the browseCacheState type and its const block (browseCacheAbsent,
browseCacheStale, browseCacheFresh) near the file’s other top-level declarations
so they are defined alongside the rest of the types/constants used by
browse-related code.
In `@pkg/database/mediadb/sql_config.go`:
- Around line 139-167: `sqlGetIndexResumeAttempts` is inconsistent with the
other database helpers because it takes `*sql.DB` instead of the shared
`sqlQueryable` interface. Update this getter to accept `db sqlQueryable` like
`sqlSetIndexResumeAttempts` and the other helpers (for example
`sqlBrowseCacheStatus`), then keep its `QueryRowContext` logic the same so it
can be tested with `fakeSQLQueryable` without requiring `sqlmock`.
In `@pkg/database/mediadb/sql_test.go`:
- Around line 1646-1654: The sql_test coverage for the media DB scan path only
uses rows with IsMissing set to false, so add a case in the relevant test setup
around the rows.NewRows/AddRow fixture and the results assertions to exercise
IsMissing = true as well. Use the existing sqlmock.NewRows construction in
sql_test.go and assert results[i].IsMissing explicitly so the scan target/order
is validated for both false and true values.
In `@pkg/database/mediascanner/mediascanner.go`:
- Around line 823-824: The scan-state cleanup is missing the newly added
PreviouslyMissing and TouchedTitles maps, so their backing arrays can survive
through rebuild phases. Update the cleanup/release block in mediascanner.go to
drop these maps alongside the existing scan-state maps, using the same pattern
already applied for other temporary state in the mediascanner logic. Make sure
the change is made near the scan-state setup and cleanup paths that manage
secondary-index and cache rebuilds.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: aef047c6-2765-4309-83d8-bad7b949ec2d
📒 Files selected for processing (29)
docs/api/methods.mdpkg/api/methods/media.gopkg/api/methods/media_browse.gopkg/api/methods/media_browse_test.gopkg/database/browseprefix/prefix.gopkg/database/database.gopkg/database/mediadb/concurrent_operations_test.gopkg/database/mediadb/mediadb.gopkg/database/mediadb/mediadb_integration_test.gopkg/database/mediadb/optimization_test.gopkg/database/mediadb/sql_browse.gopkg/database/mediadb/sql_browse_cache.gopkg/database/mediadb/sql_browse_prefix_sample_test.gopkg/database/mediadb/sql_config.gopkg/database/mediadb/sql_media.gopkg/database/mediadb/sql_media_tags.gopkg/database/mediadb/sql_test.gopkg/database/mediascanner/indexing_pipeline.gopkg/database/mediascanner/indexing_pipeline_test.gopkg/database/mediascanner/mediascanner.gopkg/database/mediascanner/mediascanner_bench_test.gopkg/database/mediascanner/touched_titles_test.gopkg/database/scraper/gamelistxml/scraper.gopkg/service/browse_cache_heal_test.gopkg/service/inbox/inbox.gopkg/service/index_resume.gopkg/service/service.gopkg/service/service_test.gopkg/testing/helpers/db_mocks.go
… path - Add an in-package MediaDB round-trip test for the resume-attempt counter (default-zero, increment, reset, and non-integer parse error), which the cross-package service tests exercise but cannot instrument for coverage. - Assert the resume-limit inbox message is surfaced when auto-resume is bounded, wiring a mock inbox into the existing max-attempts test. - Add a browse-cache self-heal test asserting the optimizing indicator is still cleared when PopulateBrowseCache fails, so clients don't show a permanent "preparing library" spinner.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/service/browse_cache_heal_test.go (1)
80-121: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueConsider extracting shared setup/poll logic.
This new failure test duplicates nearly all of
TestCheckAndHealBrowseCache_RebuildsWhenStale's mock setup and the optimizing-notification polling loop, differing only in thePopulateBrowseCachereturn value and the channel used to signal completion. A small helper (e.g., taking the desiredPopulateBrowseCacheerror and returningsawOptimizing/sawCleared) would reduce duplication across both tests.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/service/browse_cache_heal_test.go` around lines 80 - 121, The new failure test duplicates most of the setup and notification polling from TestCheckAndHealBrowseCache_RebuildsWhenStale, so extract the shared mock initialization and optimizing-notification wait logic into a small helper. Make the helper reusable by TestCheckAndHealBrowseCache_ClearsOptimizingOnRebuildFailure and the rebuild-success test, parameterizing the PopulateBrowseCache result and the completion signal handling while preserving the checks for optimizing:true and optimizing:false.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@pkg/service/browse_cache_heal_test.go`:
- Around line 80-121: The new failure test duplicates most of the setup and
notification polling from TestCheckAndHealBrowseCache_RebuildsWhenStale, so
extract the shared mock initialization and optimizing-notification wait logic
into a small helper. Make the helper reusable by
TestCheckAndHealBrowseCache_ClearsOptimizingOnRebuildFailure and the
rebuild-success test, parameterizing the PopulateBrowseCache result and the
completion signal handling while preserving the checks for optimizing:true and
optimizing:false.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: ec3d873d-5eb2-4b79-a92a-65e25d9e4f40
📒 Files selected for processing (3)
pkg/database/mediadb/index_resume_attempts_test.gopkg/service/browse_cache_heal_test.gopkg/service/service_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/service/service_test.go
UpdateLastGenerated invalidates SystemTagsCache for all indexed systems, but the selective reindex path only repopulated the systems that changed. RebuildTagCache then built the in-memory cache from an incomplete table, and GetSystemTagsCached returns the in-memory hit without self-healing, so preserved systems returned empty tags until the next invalidation. Repopulate all indexed systems to match the invalidation scope. Also harden auto-recovery: fail closed when the resume-attempt counter cannot be read or incremented, re-check indexing/optimization state after the browse cache self-heal unpauses, and add missing test coverage.
…learing
- Degraded browse routes (COUNT sub-timeout) keep their system IDs when
the systems filter names exactly one system, so filtered root entries
still carry systemId in the API response.
- Sub-timeout detection also checks the route/probe context deadline
instead of relying on errors.Is(err, context.DeadlineExceeded): the
sqlite driver can report an interrupted query as its own error, which
previously failed the whole browse instead of degrading one route.
- The browse cache invalidation sentinel embeds the schema version it
invalidates ("2-stale") so a cache invalidated under an old schema
reads as absent after an upgrade rather than serving old-schema rows.
- Prefix policy detection samples both ends of a directory partition
(forward and backward DBID index scans) when the forward sample
truncates at the limit; numbered files cluster at the front of large
directories and skewed the fraction estimate.
- Optimization failure clears the persisted step before writing the
failed status, so a crash between the two writes cannot produce a
failure-resume that skips earlier steps.
- Move the browseCacheState type and consts to the top of sql_browse.go.
Indexing recomputes DisambiguationTypes only for titles whose media or tags changed, so values stored by an older algorithm or tag mapping were never revisited — a reindex over an unchanged library recomputes nothing. - Add a DisambiguationVersion stamp and a disambiguation_backfill optimization step that recomputes every system once when the stamp is outdated, then stamps the current version. The recompute walks systems one at a time under the pauser, the same granularity indexing used when it recomputed per system on every run. - TemporaryRepairJobsPending also reports an outdated stamp, so a boot without a reindex triggers the backfill via background optimization (pauser-aware, retried, resumable via the persisted step). - Empty databases are stamped immediately: the first index computes disambiguation with the current algorithm as it inserts titles. Bump disambiguationAlgoVersion whenever the recompute query or the ZapScriptTagTypes mapping changes.
…abase media.generate accepts rebuild:true, which discards the media database file and indexes from scratch — a supported fresh start instead of manually deleting the file. Scraped metadata is lost and must be re-scraped; favourites and launcher overrides live in UserDB and are re-applied after indexing. rebuild cannot be combined with a systems filter, since a fresh database indexed selectively would silently drop every other system's media. - Rename MediaDB.RecreateAfterCorruption to Recreate (corruption recovery is now one of two callers) and serialize recreates with a CAS guard so a user rebuild can never interleave with recovery. - The indexing goroutine steps out of the tracked background-operation set around the recreate: Recreate closes the database and Close waits on that set, so running it tracked would self-deadlock. - A recreated database is stamped with the current disambiguation version so the post-rebuild optimization does not re-run the backfill over freshly computed titles.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/api/methods/media.go (1)
440-465: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick winKeep
Recreate()covered during shutdown
HandleGenerateMediadrops out ofTrackBackgroundOperation()beforeRecreate(false), but the API shutdown path only waits forapiDone; it can still reachcloseDatabase()while this goroutine is in that untracked window. Keep the rebuild tracked throughRecreate()or add a shutdown barrier around the recreate/open sequence.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/api/methods/media.go` around lines 440 - 465, HandleGenerateMedia currently drops out of MediaDB.TrackBackgroundOperation() before calling MediaDB.Recreate(false), leaving the rebuild window untracked so shutdown can race into closeDatabase(). Keep the rebuild path covered by the background-operation tracker through Recreate(), or add an equivalent shutdown barrier around the recreate/open sequence, and ensure the re-registration/cleanup around BackgroundOperationDone and TrackBackgroundOperation remains balanced in the goroutine.
🧹 Nitpick comments (1)
pkg/database/database.go (1)
815-815: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAdd a doc comment for
Recreatenow that it's general-purpose.The method was renamed from
RecreateAfterCorruptiontoRecreate, generalizing it beyond corruption recovery (it's also used for the newmedia.generaterebuild path per the PR objectives). Every other exported method in this interface has an explanatory comment; a short one here clarifying both call sites (corruption recovery and explicit rebuild) and thekeepBackupsemantics would help future implementers/consumers ofMediaDBI.📝 Suggested doc comment
+ // Recreate closes, deletes (or backs up when keepBackup is true), and + // reopens the media database with a fresh schema. Used both for + // corruption recovery and for an explicit user-triggered rebuild. Recreate(keepBackup bool) error🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/database/database.go` at line 815, Add a doc comment for the exported Recreate method on MediaDBI to match the rest of the interface. Update the comment near Recreate(keepBackup bool) error to briefly describe that it recreates the database for both corruption recovery and explicit media.generate rebuilds, and clarify what keepBackup controls for callers and implementers.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@pkg/api/methods/media.go`:
- Around line 440-465: HandleGenerateMedia currently drops out of
MediaDB.TrackBackgroundOperation() before calling MediaDB.Recreate(false),
leaving the rebuild window untracked so shutdown can race into closeDatabase().
Keep the rebuild path covered by the background-operation tracker through
Recreate(), or add an equivalent shutdown barrier around the recreate/open
sequence, and ensure the re-registration/cleanup around BackgroundOperationDone
and TrackBackgroundOperation remains balanced in the goroutine.
---
Nitpick comments:
In `@pkg/database/database.go`:
- Line 815: Add a doc comment for the exported Recreate method on MediaDBI to
match the rest of the interface. Update the comment near Recreate(keepBackup
bool) error to briefly describe that it recreates the database for both
corruption recovery and explicit media.generate rebuilds, and clarify what
keepBackup controls for callers and implementers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 24fbcb3f-52af-4c63-a29d-29904355e89b
📒 Files selected for processing (19)
docs/api/methods.mdpkg/api/methods/media.gopkg/api/methods/media_rebuild_test.gopkg/api/models/params.gopkg/database/database.gopkg/database/mediadb/concurrent_operations_test.gopkg/database/mediadb/corruption_recovery_test.gopkg/database/mediadb/disambiguation_test.gopkg/database/mediadb/mediadb.gopkg/database/mediadb/mediadb_integration_test.gopkg/database/mediadb/optimization_test.gopkg/database/mediadb/sql_browse.gopkg/database/mediadb/sql_browse_cache.gopkg/database/mediadb/sql_browse_prefix_sample_test.gopkg/database/mediadb/sql_config.gopkg/database/mediadb/sql_disambiguation_backfill.gopkg/service/index_resume.gopkg/service/media_recovery_test.gopkg/testing/helpers/db_mocks.go
🚧 Files skipped from review as they are similar to previous changes (5)
- pkg/database/mediadb/sql_config.go
- pkg/database/mediadb/concurrent_operations_test.go
- pkg/service/index_resume.go
- pkg/database/mediadb/mediadb_integration_test.go
- pkg/database/mediadb/sql_browse.go
Problem
Large libraries (~1M+ media rows) could hit permanent
media.browsetimeouts. A reindex invalidates the browse cache (BrowseIndexVersion="0") immediately but only rebuilds it at the very end of a multi-hour index. An interrupted index leavesIndexingStatus="running", which auto-resumes every boot and re-walks everything, so the index never completes and the cache is never revalidated. With no valid cache, every system-root browse falls back to a full media scan (SELECT COUNT(*) ... WHERE Path LIKE ?) that exceeds the API request timeout on cold SD storage. Reproduced on a real 1.45 GB / 1,027,136-rowmedia.dbon MiSTer.Changes
BrowseIndexVersionmismatches, distinguishing fresh / stale-present / absent states instead of falling into the slow media scan.checkAndHealBrowseCache) without requiring a full media reindex. Surfaced to clients via the existing optimizing notification.context.WithoutCancel(mattn/go-sqlite3 spawns a goroutine perrows.Next()under a cancellable context), and serialize the self-heal after the startup slug-cache rebuild so the two don't saturate the 2-connection pool.maxIndexResumeAttempts; when the cache is genuinely absent, the route-count fallback uses per-route sub-timeouts and a presence probe, returning a degraded (count-unknown) result instead of erroring the whole browse. ResetIndexResumeAttemptson successful completion so earlier interruptions don't trip the cap after a later good run.RecomputeTitleDisambiguation) and rebuild the tags cache only for dirty systems, instead of doing both for every system every index. First index touches every title, so output is unchanged there.detectBrowsePrefixPolicysamples with aLIMIT(the heuristic is sampling-safe) instead of loading every path under a directory.Verification
Recovery confirmed live on the real 1,027,136-row
media.dbon MiSTer 10.0.0.107: the previously wedged DB browsed fast with no reindex and no manual DB editing. On-devicelogBrowseTimingconfirmed the self-heal scan dropped ~455s → ~237s withWithoutCancel+ serialized rebuilds, and nocontext deadline exceededbrowse timeouts occurred during the walk. Unit tests added for the stale/absent cache paths, degraded route counts, touched-title disambiguation scenarios, prefix sampling, and the self-heal skip branches.Follow-up filed: #1039 (absent-cache probe sub-timeout can drop a route).
Summary by CodeRabbit
rebuildsupport for media generation (full “fresh start”), including validation that it can’t be used with system filters.