Skip to content

Multi-source: efficient automatic first-run maintenance (dedup->index->normalise) + readiness transparency #109

Description

@jozef2svrcek

Follow-up to multi-source Phase C (#40), split out from the C2 wizard work (#108).

Background

With imports running asynchronously on the daemon (C3, #103), the wizard no longer asks the user to run dedup/index/normalise by hand — #108 makes the Summary screen explain that the database is "prepared automatically in the background." This issue is the backend + UX that actually delivers that, efficiently and transparently.

Current behaviour (the gap)

  • Dedup already runs inline during every import (sources_syncimporter::import(.., skip_dedup=false, ..), chess-db/src/jobs.rs).
  • Index + normalise are not run by sources_sync — they only run as part of the daily update job (chess-db/src/jobs.rs, the "update" arm runs one global index_positions + normalise_players). The C3 auto-sync (chess-db/src/scheduler.rs, auto_sync_candidates → submits sources_sync) doesn't trigger them.
  • ⇒ After a fresh setup, the position index (which powers the move explorer) and FIDE name normalisation don't happen until the next scheduled daily update — potentially many hours later.

Desired behaviour

Efficient first-run maintenance pipeline. Once the enabled sources have imported for the first time, run a single maintenance pass in this order:

  1. one deduplication over the whole DB (after all first-time imports finish — more efficient than the current inline per-import dedup for a bulk first load; consider skip_dedup on first-run source imports and one global pass instead),
  2. position indexing,
  3. player-name normalisation.

It must be idempotent / de-duped against the daily update and against itself (don't stack redundant passes), and serialize correctly on the single writer thread (so it runs after imports, not before).

Transparency. The user should be able to tell, at a glance, whether the database is still being prepared or is ready to use:

  • a clear state model — e.g. importing -> preparing (dedup/index/normalise) -> ready,
  • surfaced on the wizard Summary screen and/or Home, building on the C3 header activity indicator (which today shows raw jobs, not a rolled-up "ready?" answer).

Scope notes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions