fix(database): auto-recover browse cache from stale or wedged state by wizzomafizzo · Pull Request #1040 · ZaparooProject/zaparoo-core

wizzomafizzo · 2026-07-02T09:23:09Z

Problem

Large libraries (~1M+ media rows) could hit permanent media.browse timeouts. A reindex invalidates the browse cache (BrowseIndexVersion="0") immediately but only rebuilds it at the very end of a multi-hour index. An interrupted index leaves IndexingStatus="running", which auto-resumes every boot and re-walks everything, so the index never completes and the cache is never revalidated. With no valid cache, every system-root browse falls back to a full media scan (SELECT COUNT(*) ... WHERE Path LIKE ?) that exceeds the API request timeout on cold SD storage. Reproduced on a real 1.45 GB / 1,027,136-row media.db on MiSTer.

Changes

Serve stale-but-present cache. Decouple browse readiness from "a reindex stamped the current version": a populated cache is served even when BrowseIndexVersion mismatches, distinguishing fresh / stale-present / absent states instead of falling into the slow media scan.
Standalone background self-heal. On startup, if media exists but the browse cache is stale/absent, rebuild it in the background (checkAndHealBrowseCache) without requiring a full media reindex. Surfaced to clients via the existing optimizing notification.
Avoid the per-row-goroutine path. Wrap the self-heal 1M-row scan in context.WithoutCancel (mattn/go-sqlite3 spawns a goroutine per rows.Next() under a cancellable context), and serialize the self-heal after the startup slug-cache rebuild so the two don't saturate the 2-connection pool.
Break the wedged-resume loop / degrade gracefully. Bound auto-resume with maxIndexResumeAttempts; when the cache is genuinely absent, the route-count fallback uses per-route sub-timeouts and a presence probe, returning a degraded (count-unknown) result instead of erroring the whole browse. Reset IndexResumeAttempts on successful completion so earlier interruptions don't trip the cap after a later good run.
Recompute only what changed. Recompute disambiguation for titles touched this run (via existing RecomputeTitleDisambiguation) and rebuild the tags cache only for dirty systems, instead of doing both for every system every index. First index touches every title, so output is unchanged there.
Bound the prefix-policy read. detectBrowsePrefixPolicy samples with a LIMIT (the heuristic is sampling-safe) instead of loading every path under a directory.

Verification

Recovery confirmed live on the real 1,027,136-row media.db on MiSTer 10.0.0.107: the previously wedged DB browsed fast with no reindex and no manual DB editing. On-device logBrowseTiming confirmed the self-heal scan dropped ~455s → ~237s with WithoutCancel + serialized rebuilds, and no context deadline exceeded browse timeouts occurred during the walk. Unit tests added for the stale/absent cache paths, degraded route counts, touched-title disambiguation scenarios, prefix sampling, and the self-heal skip branches.

Follow-up filed: #1039 (absent-cache probe sub-timeout can drop a route).

Summary by CodeRabbit

New Features
- Added rebuild support for media generation (full “fresh start”), including validation that it can’t be used with system filters.
- Media browse can now include system-root routes even when exact file counts are temporarily unknown (count details may be unset).
Bug Fixes
- Improved handling of missing media rows and more resilient, timeout-aware degraded route counting.
- Reduced unnecessary disambiguation work by tracking changes across indexing runs.
Documentation / Tests
- Updated API documentation and extended browse-cache healing, degraded-count, and indexing resume test coverage.
- Added coverage for stopping repeated interrupted indexing via a persisted resume-attempt limit.

Large libraries (~1M+ media rows) could hit permanent media.browse timeouts: a reindex invalidates the browse cache to BrowseIndexVersion "0" immediately but only rebuilds it at the end of a multi-hour index, and an interrupted index leaves IndexingStatus="running" that auto- resumes every boot and never completes. With no valid cache, every system-root browse fell back to a full media scan that exceeds the API request timeout on cold SD storage. - Serve a present-but-version-stale browse cache instead of falling back to the slow media scan; distinguish fresh / stale-present / absent cache states. - Add a standalone background browse-cache self-heal on startup (checkAndHealBrowseCache) that rebuilds the cache without requiring a full media reindex, surfaced to clients via an optimizing notification. - Wrap the self-heal media scan in context.WithoutCancel to avoid the mattn/go-sqlite3 per-row-goroutine path on the 1M-row read; serialize it after the startup slug-cache rebuild so they don't saturate the 2-connection pool. - Bound the wedged-reindex auto-resume loop (maxIndexResumeAttempts) and degrade the absent-cache route-count fallback with per-route sub- timeouts and a presence probe instead of erroring the whole browse. - Recompute disambiguation only for titles touched this run and rebuild the tags cache only for dirty systems, instead of doing both for every system every index. - Sample (LIMIT) the browse prefix-policy read instead of loading every path under a directory. - Reset IndexResumeAttempts when an index completes so earlier interruptions don't trip the resume cap after a later successful run.

coderabbitai · 2026-07-02T09:23:37Z

📝 Walkthrough

Walkthrough

This PR updates browse-route counting to return degraded counts, adds browse-cache freshness tracking and self-healing, limits automatic index resume attempts, and scopes indexing recomputation using touched-title and missing-media tracking.

Changes

Media browse, cache, resume, and indexing state

Layer / File(s)	Summary
Shared contracts and persistence `docs/api/methods.md`, `pkg/database/database.go`, `pkg/database/browseprefix/prefix.go`, `pkg/database/mediadb/sql_config.go`, `pkg/database/mediadb/sql_browse_cache.go`, `pkg/service/inbox/inbox.go`, `pkg/api/models/params.go`, `pkg/testing/helpers/db_mocks.go`, `pkg/database/mediadb/index_resume_attempts_test.go`	Adds browse/indexing fields and methods, persistence for resume attempts, the browse-cache invalidation sentinel, the inbox category constant, API params/docs, and supporting mocks/tests.
Media generation rebuild flow `pkg/api/methods/media.go`, `pkg/api/methods/media_rebuild_test.go`, `pkg/database/database.go`, `pkg/database/mediadb/mediadb.go`, `pkg/database/mediadb/corruption_recovery_test.go`, `pkg/service/media_recovery_test.go`, `pkg/service/index_resume.go`	Adds rebuild mode to media generation, recreates the media database in the background, resets resume-attempt tracking after success, and renames the database recreate API through service and tests.
Browse route fallback and response shape `pkg/database/mediadb/sql_browse.go`, `pkg/api/methods/media_browse.go`, `pkg/api/methods/media_browse_test.go`, `pkg/database/mediadb/mediadb_integration_test.go`, `pkg/database/mediadb/sql_test.go`	Per-route count timeouts now degrade to presence probes and return `CountUnknown`; browse results keep unknown-count routes with an unset `fileCount`, and the SQL/tests reflect the new route-count shape.
Browse cache state and optimization resume `pkg/database/mediadb/sql_browse.go`, `pkg/database/mediadb/mediadb.go`, `pkg/database/mediadb/sql_browse_prefix_sample_test.go`, `pkg/database/mediadb/mediadb_integration_test.go`, `pkg/database/mediadb/optimization_test.go`, `pkg/database/mediadb/concurrent_operations_test.go`, `pkg/database/mediadb/disambiguation_test.go`	Browse-cache readiness now distinguishes absent/stale/fresh states, prefix detection samples paths with a fixed limit, and background optimization resumes from a persisted step with updated step ordering and tests.
Index resume limits and browse-cache healing `pkg/service/index_resume.go`, `pkg/service/service.go`, `pkg/service/service_test.go`, `pkg/service/browse_cache_heal_test.go`	Automatic resume attempts are capped and reset on clean state, and startup now triggers a browse-cache self-heal path that rebuilds stale/absent cache data with pause and notification handling.
Touched-title and missing-media tracking `pkg/database/database.go`, `pkg/database/mediascanner/indexing_pipeline.go`, `pkg/database/mediascanner/mediascanner.go`, `pkg/database/mediascanner/indexing_pipeline_test.go`, `pkg/database/mediascanner/mediascanner_bench_test.go`, `pkg/database/mediascanner/touched_titles_test.go`, `pkg/database/scraper/gamelistxml/scraper.go`	Indexing now tracks touched titles and previously missing media, canonicalizes filename tags, recomputes disambiguation only for affected titles, and restricts downstream cache rebuilds to dirty systems.
Media missing state and scanner tag filtering `pkg/database/mediadb/sql_media.go`, `pkg/database/mediadb/sql_media_tags.go`, `pkg/database/mediadb/sql_test.go`	Media queries now expose `IsMissing`, and scanner-owned tag retrieval excludes a broader non-scanner tag set instead of only user tags.

Estimated code review effort: 5 (Critical) | ~120 minutes

Possibly related issues

media.browse: populated system root can vanish when COUNT and presence probe both time out on cold large DBs #1039: Modifies the same browse-route timeout fallback and presence-probe path in pkg/database/mediadb/sql_browse.go.

Possibly related PRs

ZaparooProject/zaparoo-core#894: Both PRs touch browse-prefix detection behavior in pkg/database/mediadb/sql_browse.go.
ZaparooProject/zaparoo-core#937: Both PRs modify browse sort/prefix-policy handling in pkg/database/mediadb/sql_browse.go.
ZaparooProject/zaparoo-core#1003: Both PRs update the MediaDBI recovery API by replacing RecreateAfterCorruption with Recreate.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: automatically recovering stale or wedged browse-cache state.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/browse-cache-auto-recovery

_{Comment @coderabbitai help to get the list of available commands.}

codecov · 2026-07-02T09:31:11Z

Codecov Report

❌ Patch coverage is 73.23340% with 125 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...kg/database/mediadb/sql_disambiguation_backfill.go	63.85%	15 Missing and 15 partials ⚠️
pkg/service/index_resume.go	64.00%	17 Missing and 10 partials ⚠️
pkg/database/mediadb/sql_browse.go	72.82%	16 Missing and 9 partials ⚠️
pkg/database/mediadb/mediadb.go	62.50%	12 Missing and 12 partials ⚠️
pkg/database/mediascanner/indexing_pipeline.go	89.28%	3 Missing and 3 partials ⚠️
pkg/api/methods/media.go	86.20%	3 Missing and 1 partial ⚠️
pkg/database/mediadb/sql_config.go	82.60%	2 Missing and 2 partials ⚠️
pkg/database/mediadb/sql_media_tags.go	88.88%	2 Missing ⚠️
pkg/database/mediascanner/mediascanner.go	77.77%	1 Missing and 1 partial ⚠️
pkg/service/service.go	0.00%	1 Missing ⚠️

📢 Thoughts on this report? Let us know!

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/database/mediadb/concurrent_operations_test.go (1)
432-446: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Update the optimization mock sequence in pkg/database/mediadb/concurrent_operations_test.go:417

Add expectOptimizationResumeRead(mock), expectTemporaryParentDirRepairStepNoop(mock), and expectBrowseCacheStep(mock) before expectAnalyzeStep(mock) so TestRaceConditionBetweenStatusAndOptimization matches the current background-optimization flow.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/database/mediadb/concurrent_operations_test.go` around lines 432 - 446,
The optimization mock sequence in TestRaceConditionBetweenStatusAndOptimization
is missing steps from the current background-optimization flow. Update the
expectations around the existing expectAnalyzeStep and expectPostAnalyzeSteps
calls to include expectOptimizationResumeRead(mock),
expectTemporaryParentDirRepairStepNoop(mock), and expectBrowseCacheStep(mock)
before analyze. Keep the DBConfig status expectations aligned with the revised
sequence so the mocked exec order matches the actual optimization path.

🧹 Nitpick comments (4)

pkg/database/mediadb/sql_test.go (1)
1646-1654: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Add coverage for IsMissing = true.

Every updated mock row across these tests hardcodes IsMissing as false, and no assertion checks results[i].IsMissing. A regression in the new scan target (e.g., wrong scan-arg order) wouldn't be caught by this suite.
✅ Example addition
 	rows := sqlmock.NewRows(cols).
 		AddRow(int64(1), marioPath, gamesDir, int64(10), "Super Mario Bros.", false).
-		AddRow(int64(2), zeldaPath, gamesDir, int64(11), "The Legend of Zelda", false).
+		AddRow(int64(2), zeldaPath, gamesDir, int64(11), "The Legend of Zelda", true).
 		AddRow(int64(3), metroidPath, gamesDir, int64(12), "Metroid", false)
...
+	assert.True(t, results[1].IsMissing)
+	assert.False(t, results[0].IsMissing)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/database/mediadb/sql_test.go` around lines 1646 - 1654, The sql_test
coverage for the media DB scan path only uses rows with IsMissing set to false,
so add a case in the relevant test setup around the rows.NewRows/AddRow fixture
and the results assertions to exercise IsMissing = true as well. Use the
existing sqlmock.NewRows construction in sql_test.go and assert
results[i].IsMissing explicitly so the scan target/order is validated for both
false and true values.
pkg/database/mediascanner/mediascanner.go (1)
823-824: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Release the new scan-state maps before cache builds.

The cleanup block later drops old scan-state maps to reclaim large backing arrays, but these newly added maps are not included. Large systems can retain TouchedTitles/PreviouslyMissing buckets through secondary-index and cache rebuild phases.
♻️ Proposed fix
 scanState.TagIDs = nil
 
 scanState.MissingMedia = nil
+scanState.PreviouslyMissing = nil
+scanState.TouchedTitles = nil
 status.Phase = PhaseCreatingIndexes
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/database/mediascanner/mediascanner.go` around lines 823 - 824, The
scan-state cleanup is missing the newly added PreviouslyMissing and
TouchedTitles maps, so their backing arrays can survive through rebuild phases.
Update the cleanup/release block in mediascanner.go to drop these maps alongside
the existing scan-state maps, using the same pattern already applied for other
temporary state in the mediascanner logic. Make sure the change is made near the
scan-state setup and cleanup paths that manage secondary-index and cache
rebuilds.
pkg/database/mediadb/sql_config.go (1)
139-167: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Inconsistent parameter type: sqlQueryable vs. *sql.DB.

sqlSetIndexResumeAttempts (and other new helpers like sqlBrowseCacheStatus in sql_browse.go) take db sqlQueryable, but sqlGetIndexResumeAttempts takes the concrete db *sql.DB. This is inconsistent with the established pattern in this package and prevents testing this getter with the lightweight fakeSQLQueryable used elsewhere, forcing heavier sqlmock setups instead.
♻️ Suggested fix
-func sqlGetIndexResumeAttempts(ctx context.Context, db *sql.DB) (int, error) {
+func sqlGetIndexResumeAttempts(ctx context.Context, db sqlQueryable) (int, error) {
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/database/mediadb/sql_config.go` around lines 139 - 167,
`sqlGetIndexResumeAttempts` is inconsistent with the other database helpers
because it takes `*sql.DB` instead of the shared `sqlQueryable` interface.
Update this getter to accept `db sqlQueryable` like `sqlSetIndexResumeAttempts`
and the other helpers (for example `sqlBrowseCacheStatus`), then keep its
`QueryRowContext` logic the same so it can be tested with `fakeSQLQueryable`
without requiring `sqlmock`.
pkg/database/mediadb/sql_browse.go (1)
312-330: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Move browseCacheState type/consts near the top of the file.

This new type and its consts are introduced ~300 lines into the file rather than near the top before functions/methods.
♻️ Suggested placement
-// (leave definitions in place at line 312)
+// Relocate the `browseCacheState` type, its consts, and any other new
+// package-level declarations introduced by this change to the top of the
+// file, grouped with existing types/consts.
As per coding guidelines, "Define Go types and consts near the top of the file, before functions and methods."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/database/mediadb/sql_browse.go` around lines 312 - 330, Move the browse
cache state declarations closer to the top of sql_browse.go, before the function
and method implementations. Relocate the browseCacheState type and its const
block (browseCacheAbsent, browseCacheStale, browseCacheFresh) near the file’s
other top-level declarations so they are defined alongside the rest of the
types/constants used by browse-related code.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/database/mediadb/sql_browse.go`:
- Around line 1660-1720: `browseRouteCounts` currently adds degraded
`BrowseRouteCount` entries with `CountUnknown` but leaves `SystemIDs` empty,
which prevents `media.browse` from resolving the root’s system. Update the
degraded-route path in `sqlBrowseRouteHasMedia`/the surrounding browse-count
logic to carry a route-specific system signal into the returned count, and
populate `SystemIDs` from that route’s matched system rather than from
`opts.Systems`; keep the handling localized to the route-processing code so
shared paths are not misattributed.

In `@pkg/database/mediascanner/mediascanner.go`:
- Around line 1477-1479: The cache rebuild path in mediascanner’s selective
re-index flow is using only dirty systems, but UpdateLastGenerated clears
SystemTagsCache for all affected systems first, so preserved systems can be left
without cache rows. Update the PopulateSystemTagsCacheForSystems call in the
re-index logic to repopulate the full relevant system scope rather than
dirtySystemDefs only, so unchanged selective re-indexes still restore preserved
SystemTagsCache entries before RebuildTagCache runs.

In `@pkg/service/index_resume.go`:
- Around line 289-355: The background browse-cache self-heal in
checkAndHealBrowseCache can resume after pauser.Wait and call
PopulateBrowseCache even if a new reindex or optimization started while it was
paused. Add a second state check immediately after pauser.Wait returns, reusing
methods.IsIndexing() and db.MediaDB.GetOptimizationStatus(), and bail out if
indexing is now active or the optimization status is running before proceeding
to notifications.MediaIndexing or PopulateBrowseCache.
- Around line 86-114: The resume-attempt budget in index_resume.go currently
fails open when GetIndexResumeAttempts() or IncrementIndexResumeAttempts()
returns an error, which can let auto-resume continue indefinitely. Update the
logic around db.MediaDB.GetIndexResumeAttempts and
db.MediaDB.IncrementIndexResumeAttempts so that any counter read/write failure
is treated as a closed failure path: log the error and skip resuming indexing
instead of defaulting attempts to zero or proceeding without a successful
increment. Keep the existing maxIndexResumeAttempts and cancellation handling in
place, but ensure the safety cap cannot be bypassed by persistent storage
errors.

---

Outside diff comments:
In `@pkg/database/mediadb/concurrent_operations_test.go`:
- Around line 432-446: The optimization mock sequence in
TestRaceConditionBetweenStatusAndOptimization is missing steps from the current
background-optimization flow. Update the expectations around the existing
expectAnalyzeStep and expectPostAnalyzeSteps calls to include
expectOptimizationResumeRead(mock),
expectTemporaryParentDirRepairStepNoop(mock), and expectBrowseCacheStep(mock)
before analyze. Keep the DBConfig status expectations aligned with the revised
sequence so the mocked exec order matches the actual optimization path.

---

Nitpick comments:
In `@pkg/database/mediadb/sql_browse.go`:
- Around line 312-330: Move the browse cache state declarations closer to the
top of sql_browse.go, before the function and method implementations. Relocate
the browseCacheState type and its const block (browseCacheAbsent,
browseCacheStale, browseCacheFresh) near the file’s other top-level declarations
so they are defined alongside the rest of the types/constants used by
browse-related code.

In `@pkg/database/mediadb/sql_config.go`:
- Around line 139-167: `sqlGetIndexResumeAttempts` is inconsistent with the
other database helpers because it takes `*sql.DB` instead of the shared
`sqlQueryable` interface. Update this getter to accept `db sqlQueryable` like
`sqlSetIndexResumeAttempts` and the other helpers (for example
`sqlBrowseCacheStatus`), then keep its `QueryRowContext` logic the same so it
can be tested with `fakeSQLQueryable` without requiring `sqlmock`.

In `@pkg/database/mediadb/sql_test.go`:
- Around line 1646-1654: The sql_test coverage for the media DB scan path only
uses rows with IsMissing set to false, so add a case in the relevant test setup
around the rows.NewRows/AddRow fixture and the results assertions to exercise
IsMissing = true as well. Use the existing sqlmock.NewRows construction in
sql_test.go and assert results[i].IsMissing explicitly so the scan target/order
is validated for both false and true values.

In `@pkg/database/mediascanner/mediascanner.go`:
- Around line 823-824: The scan-state cleanup is missing the newly added
PreviouslyMissing and TouchedTitles maps, so their backing arrays can survive
through rebuild phases. Update the cleanup/release block in mediascanner.go to
drop these maps alongside the existing scan-state maps, using the same pattern
already applied for other temporary state in the mediascanner logic. Make sure
the change is made near the scan-state setup and cleanup paths that manage
secondary-index and cache rebuilds.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: aef047c6-2765-4309-83d8-bad7b949ec2d

📥 Commits

Reviewing files that changed from the base of the PR and between 71fa917 and 6044960.

📒 Files selected for processing (29)

docs/api/methods.md
pkg/api/methods/media.go
pkg/api/methods/media_browse.go
pkg/api/methods/media_browse_test.go
pkg/database/browseprefix/prefix.go
pkg/database/database.go
pkg/database/mediadb/concurrent_operations_test.go
pkg/database/mediadb/mediadb.go
pkg/database/mediadb/mediadb_integration_test.go
pkg/database/mediadb/optimization_test.go
pkg/database/mediadb/sql_browse.go
pkg/database/mediadb/sql_browse_cache.go
pkg/database/mediadb/sql_browse_prefix_sample_test.go
pkg/database/mediadb/sql_config.go
pkg/database/mediadb/sql_media.go
pkg/database/mediadb/sql_media_tags.go
pkg/database/mediadb/sql_test.go
pkg/database/mediascanner/indexing_pipeline.go
pkg/database/mediascanner/indexing_pipeline_test.go
pkg/database/mediascanner/mediascanner.go
pkg/database/mediascanner/mediascanner_bench_test.go
pkg/database/mediascanner/touched_titles_test.go
pkg/database/scraper/gamelistxml/scraper.go
pkg/service/browse_cache_heal_test.go
pkg/service/inbox/inbox.go
pkg/service/index_resume.go
pkg/service/service.go
pkg/service/service_test.go
pkg/testing/helpers/db_mocks.go

… path - Add an in-package MediaDB round-trip test for the resume-attempt counter (default-zero, increment, reset, and non-integer parse error), which the cross-package service tests exercise but cannot instrument for coverage. - Assert the resume-limit inbox message is surfaced when auto-resume is bounded, wiring a mock inbox into the existing max-attempts test. - Add a browse-cache self-heal test asserting the optimizing indicator is still cleared when PopulateBrowseCache fails, so clients don't show a permanent "preparing library" spinner.

coderabbitai

🧹 Nitpick comments (1)

pkg/service/browse_cache_heal_test.go (1)
80-121: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Consider extracting shared setup/poll logic.

This new failure test duplicates nearly all of TestCheckAndHealBrowseCache_RebuildsWhenStale's mock setup and the optimizing-notification polling loop, differing only in the PopulateBrowseCache return value and the channel used to signal completion. A small helper (e.g., taking the desired PopulateBrowseCache error and returning sawOptimizing/sawCleared) would reduce duplication across both tests.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/service/browse_cache_heal_test.go` around lines 80 - 121, The new failure
test duplicates most of the setup and notification polling from
TestCheckAndHealBrowseCache_RebuildsWhenStale, so extract the shared mock
initialization and optimizing-notification wait logic into a small helper. Make
the helper reusable by
TestCheckAndHealBrowseCache_ClearsOptimizingOnRebuildFailure and the
rebuild-success test, parameterizing the PopulateBrowseCache result and the
completion signal handling while preserving the checks for optimizing:true and
optimizing:false.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@pkg/service/browse_cache_heal_test.go`:
- Around line 80-121: The new failure test duplicates most of the setup and
notification polling from TestCheckAndHealBrowseCache_RebuildsWhenStale, so
extract the shared mock initialization and optimizing-notification wait logic
into a small helper. Make the helper reusable by
TestCheckAndHealBrowseCache_ClearsOptimizingOnRebuildFailure and the
rebuild-success test, parameterizing the PopulateBrowseCache result and the
completion signal handling while preserving the checks for optimizing:true and
optimizing:false.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ec3d873d-5eb2-4b79-a92a-65e25d9e4f40

📥 Commits

Reviewing files that changed from the base of the PR and between 6044960 and 1123978.

📒 Files selected for processing (3)

pkg/database/mediadb/index_resume_attempts_test.go
pkg/service/browse_cache_heal_test.go
pkg/service/service_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

pkg/service/service_test.go

UpdateLastGenerated invalidates SystemTagsCache for all indexed systems, but the selective reindex path only repopulated the systems that changed. RebuildTagCache then built the in-memory cache from an incomplete table, and GetSystemTagsCached returns the in-memory hit without self-healing, so preserved systems returned empty tags until the next invalidation. Repopulate all indexed systems to match the invalidation scope. Also harden auto-recovery: fail closed when the resume-attempt counter cannot be read or incremented, re-check indexing/optimization state after the browse cache self-heal unpauses, and add missing test coverage.

…learing - Degraded browse routes (COUNT sub-timeout) keep their system IDs when the systems filter names exactly one system, so filtered root entries still carry systemId in the API response. - Sub-timeout detection also checks the route/probe context deadline instead of relying on errors.Is(err, context.DeadlineExceeded): the sqlite driver can report an interrupted query as its own error, which previously failed the whole browse instead of degrading one route. - The browse cache invalidation sentinel embeds the schema version it invalidates ("2-stale") so a cache invalidated under an old schema reads as absent after an upgrade rather than serving old-schema rows. - Prefix policy detection samples both ends of a directory partition (forward and backward DBID index scans) when the forward sample truncates at the limit; numbered files cluster at the front of large directories and skewed the fraction estimate. - Optimization failure clears the persisted step before writing the failed status, so a crash between the two writes cannot produce a failure-resume that skips earlier steps. - Move the browseCacheState type and consts to the top of sql_browse.go.

Indexing recomputes DisambiguationTypes only for titles whose media or tags changed, so values stored by an older algorithm or tag mapping were never revisited — a reindex over an unchanged library recomputes nothing. - Add a DisambiguationVersion stamp and a disambiguation_backfill optimization step that recomputes every system once when the stamp is outdated, then stamps the current version. The recompute walks systems one at a time under the pauser, the same granularity indexing used when it recomputed per system on every run. - TemporaryRepairJobsPending also reports an outdated stamp, so a boot without a reindex triggers the backfill via background optimization (pauser-aware, retried, resumable via the persisted step). - Empty databases are stamped immediately: the first index computes disambiguation with the current algorithm as it inserts titles. Bump disambiguationAlgoVersion whenever the recompute query or the ZapScriptTagTypes mapping changes.

…abase media.generate accepts rebuild:true, which discards the media database file and indexes from scratch — a supported fresh start instead of manually deleting the file. Scraped metadata is lost and must be re-scraped; favourites and launcher overrides live in UserDB and are re-applied after indexing. rebuild cannot be combined with a systems filter, since a fresh database indexed selectively would silently drop every other system's media. - Rename MediaDB.RecreateAfterCorruption to Recreate (corruption recovery is now one of two callers) and serialize recreates with a CAS guard so a user rebuild can never interleave with recovery. - The indexing goroutine steps out of the tracked background-operation set around the recreate: Recreate closes the database and Close waits on that set, so running it tracked would self-deadlock. - A recreated database is stamped with the current disambiguation version so the post-rebuild optimization does not re-run the backfill over freshly computed titles.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

pkg/api/methods/media.go (1)
440-465: 🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Keep Recreate() covered during shutdown

HandleGenerateMedia drops out of TrackBackgroundOperation() before Recreate(false), but the API shutdown path only waits for apiDone; it can still reach closeDatabase() while this goroutine is in that untracked window. Keep the rebuild tracked through Recreate() or add a shutdown barrier around the recreate/open sequence.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/api/methods/media.go` around lines 440 - 465, HandleGenerateMedia
currently drops out of MediaDB.TrackBackgroundOperation() before calling
MediaDB.Recreate(false), leaving the rebuild window untracked so shutdown can
race into closeDatabase(). Keep the rebuild path covered by the
background-operation tracker through Recreate(), or add an equivalent shutdown
barrier around the recreate/open sequence, and ensure the
re-registration/cleanup around BackgroundOperationDone and
TrackBackgroundOperation remains balanced in the goroutine.

🧹 Nitpick comments (1)

pkg/database/database.go (1)
815-815: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Add a doc comment for Recreate now that it's general-purpose.

The method was renamed from RecreateAfterCorruption to Recreate, generalizing it beyond corruption recovery (it's also used for the new media.generate rebuild path per the PR objectives). Every other exported method in this interface has an explanatory comment; a short one here clarifying both call sites (corruption recovery and explicit rebuild) and the keepBackup semantics would help future implementers/consumers of MediaDBI.
📝 Suggested doc comment
+	// Recreate closes, deletes (or backs up when keepBackup is true), and
+	// reopens the media database with a fresh schema. Used both for
+	// corruption recovery and for an explicit user-triggered rebuild.
 	Recreate(keepBackup bool) error
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/database/database.go` at line 815, Add a doc comment for the exported
Recreate method on MediaDBI to match the rest of the interface. Update the
comment near Recreate(keepBackup bool) error to briefly describe that it
recreates the database for both corruption recovery and explicit media.generate
rebuilds, and clarify what keepBackup controls for callers and implementers.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@pkg/api/methods/media.go`:
- Around line 440-465: HandleGenerateMedia currently drops out of
MediaDB.TrackBackgroundOperation() before calling MediaDB.Recreate(false),
leaving the rebuild window untracked so shutdown can race into closeDatabase().
Keep the rebuild path covered by the background-operation tracker through
Recreate(), or add an equivalent shutdown barrier around the recreate/open
sequence, and ensure the re-registration/cleanup around BackgroundOperationDone
and TrackBackgroundOperation remains balanced in the goroutine.

---

Nitpick comments:
In `@pkg/database/database.go`:
- Line 815: Add a doc comment for the exported Recreate method on MediaDBI to
match the rest of the interface. Update the comment near Recreate(keepBackup
bool) error to briefly describe that it recreates the database for both
corruption recovery and explicit media.generate rebuilds, and clarify what
keepBackup controls for callers and implementers.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 24fbcb3f-52af-4c63-a29d-29904355e89b

📥 Commits

Reviewing files that changed from the base of the PR and between b5c6b2e and fb8865a.

📒 Files selected for processing (19)

docs/api/methods.md
pkg/api/methods/media.go
pkg/api/methods/media_rebuild_test.go
pkg/api/models/params.go
pkg/database/database.go
pkg/database/mediadb/concurrent_operations_test.go
pkg/database/mediadb/corruption_recovery_test.go
pkg/database/mediadb/disambiguation_test.go
pkg/database/mediadb/mediadb.go
pkg/database/mediadb/mediadb_integration_test.go
pkg/database/mediadb/optimization_test.go
pkg/database/mediadb/sql_browse.go
pkg/database/mediadb/sql_browse_cache.go
pkg/database/mediadb/sql_browse_prefix_sample_test.go
pkg/database/mediadb/sql_config.go
pkg/database/mediadb/sql_disambiguation_backfill.go
pkg/service/index_resume.go
pkg/service/media_recovery_test.go
pkg/testing/helpers/db_mocks.go

🚧 Files skipped from review as they are similar to previous changes (5)

pkg/database/mediadb/sql_config.go
pkg/database/mediadb/concurrent_operations_test.go
pkg/service/index_resume.go
pkg/database/mediadb/mediadb_integration_test.go
pkg/database/mediadb/sql_browse.go

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread pkg/database/mediadb/sql_browse.go

Comment thread pkg/database/mediascanner/mediascanner.go Outdated

Comment thread pkg/service/index_resume.go

Comment thread pkg/service/index_resume.go

coderabbitai Bot reviewed Jul 2, 2026

View reviewed changes

wizzomafizzo added 5 commits July 2, 2026 18:05

Merge branch 'main' into fix/browse-cache-auto-recovery

fb8865a

coderabbitai Bot reviewed Jul 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(database): auto-recover browse cache from stale or wedged state#1040

fix(database): auto-recover browse cache from stale or wedged state#1040
wizzomafizzo wants to merge 7 commits into
mainfrom
fix/browse-cache-auto-recovery

wizzomafizzo commented Jul 2, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

codecov Bot commented Jul 2, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Uh oh!

Conversation

wizzomafizzo commented Jul 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

codecov Bot commented Jul 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wizzomafizzo commented Jul 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 2, 2026 •

edited

Loading

codecov Bot commented Jul 2, 2026 •

edited

Loading