Skip to content

Full reindex via job queue #1002

@yuanzhou

Description

@yuanzhou

The recent TEST full riendex (after PROD databases copy with more data) took over 30 hours to finish. Supposedly this can be significantly improved when we take advantage the reindex job queue feature. This work is based on the current full reindex via script, but also take #895 into consideration (some may not longer valid once we start using the job queue).

Key points:

  • Queue isolation and index isolation (Full reindex creates separate indices and rebuilds the documents from all Donors)
  • Live reindex (31 workers) and full reindex (1 worker, low priority and limited concurrency) procedures both run inside the same search-api container
  • May consider using alias (POST _aliases) to swap the indices (is this more efficient than clone indices?)
  • Sync (catch up last updates) still needed
  • Add script option to delete old indices when manually executed (no auto delete in case we'll want to compare)

Benefits:

  • Full reindex completion with reduced total hours (also less duplicates when using the job queue)
  • Full reindex never blocks regular jobs
  • Incremental always remains responsive
  • Redis remains shared but logically isolated

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions