Follow-up to multi-source Phase C (#40), split out from the C2 wizard work (#108).
Background
With imports running asynchronously on the daemon (C3, #103), the wizard no longer asks the user to run dedup/index/normalise by hand — #108 makes the Summary screen explain that the database is "prepared automatically in the background." This issue is the backend + UX that actually delivers that, efficiently and transparently.
Current behaviour (the gap)
- Dedup already runs inline during every import (
sources_sync → importer::import(.., skip_dedup=false, ..), chess-db/src/jobs.rs).
- Index + normalise are not run by
sources_sync — they only run as part of the daily update job (chess-db/src/jobs.rs, the "update" arm runs one global index_positions + normalise_players). The C3 auto-sync (chess-db/src/scheduler.rs, auto_sync_candidates → submits sources_sync) doesn't trigger them.
- ⇒ After a fresh setup, the position index (which powers the move explorer) and FIDE name normalisation don't happen until the next scheduled daily
update — potentially many hours later.
Desired behaviour
Efficient first-run maintenance pipeline. Once the enabled sources have imported for the first time, run a single maintenance pass in this order:
- one deduplication over the whole DB (after all first-time imports finish — more efficient than the current inline per-import dedup for a bulk first load; consider
skip_dedup on first-run source imports and one global pass instead),
- position indexing,
- player-name normalisation.
It must be idempotent / de-duped against the daily update and against itself (don't stack redundant passes), and serialize correctly on the single writer thread (so it runs after imports, not before).
Transparency. The user should be able to tell, at a glance, whether the database is still being prepared or is ready to use:
- a clear state model — e.g.
importing -> preparing (dedup/index/normalise) -> ready,
- surfaced on the wizard Summary screen and/or Home, building on the C3 header activity indicator (which today shows raw jobs, not a rolled-up "ready?" answer).
Scope notes
Follow-up to multi-source Phase C (#40), split out from the C2 wizard work (#108).
Background
With imports running asynchronously on the daemon (C3, #103), the wizard no longer asks the user to run dedup/index/normalise by hand — #108 makes the Summary screen explain that the database is "prepared automatically in the background." This issue is the backend + UX that actually delivers that, efficiently and transparently.
Current behaviour (the gap)
sources_sync→importer::import(.., skip_dedup=false, ..),chess-db/src/jobs.rs).sources_sync— they only run as part of the dailyupdatejob (chess-db/src/jobs.rs, the"update"arm runs one globalindex_positions+normalise_players). The C3 auto-sync (chess-db/src/scheduler.rs,auto_sync_candidates→ submitssources_sync) doesn't trigger them.update— potentially many hours later.Desired behaviour
Efficient first-run maintenance pipeline. Once the enabled sources have imported for the first time, run a single maintenance pass in this order:
skip_dedupon first-run source imports and one global pass instead),It must be idempotent / de-duped against the daily
updateand against itself (don't stack redundant passes), and serialize correctly on the single writer thread (so it runs after imports, not before).Transparency. The user should be able to tell, at a glance, whether the database is still being prepared or is ready to use:
importing -> preparing (dedup/index/normalise) -> ready,Scope notes