Full reindex via job queue

The recent TEST full riendex (after PROD databases copy with more data) took over 30 hours to finish. Supposedly this can be significantly improved when we take advantage the reindex job queue feature. This work is based on the current full reindex via script, but also take https://github.com/hubmapconsortium/search-api/issues/895 into consideration (some may not longer valid once we start using the job queue).

Key points:
- Queue isolation and index isolation (Full reindex creates separate indices and rebuilds the documents from all Donors)
- Live reindex (31 workers) and full reindex (1 worker, low priority and limited concurrency) procedures both run inside the same search-api container
- May consider using alias (`POST _aliases`) to swap the indices (is this more efficient than clone indices?)
- Sync (catch up last updates) still needed
- Add script option to delete old indices when manually executed (no auto delete in case we'll want to compare)

Benefits:
- Full reindex completion with reduced total hours (also less duplicates when using the job queue)
- Full reindex never blocks regular jobs
- Incremental always remains responsive
- Redis remains shared but logically isolated

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full reindex via job queue #1002

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Full reindex via job queue #1002

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions