The recent TEST full riendex (after PROD databases copy with more data) took over 30 hours to finish. Supposedly this can be significantly improved when we take advantage the reindex job queue feature. This work is based on the current full reindex via script, but also take #895 into consideration (some may not longer valid once we start using the job queue).
Key points:
- Queue isolation and index isolation (Full reindex creates separate indices and rebuilds the documents from all Donors)
- Live reindex (31 workers) and full reindex (1 worker, low priority and limited concurrency) procedures both run inside the same search-api container
- May consider using alias (
POST _aliases) to swap the indices (is this more efficient than clone indices?)
- Sync (catch up last updates) still needed
- Add script option to delete old indices when manually executed (no auto delete in case we'll want to compare)
Benefits:
- Full reindex completion with reduced total hours (also less duplicates when using the job queue)
- Full reindex never blocks regular jobs
- Incremental always remains responsive
- Redis remains shared but logically isolated
The recent TEST full riendex (after PROD databases copy with more data) took over 30 hours to finish. Supposedly this can be significantly improved when we take advantage the reindex job queue feature. This work is based on the current full reindex via script, but also take #895 into consideration (some may not longer valid once we start using the job queue).
Key points:
POST _aliases) to swap the indices (is this more efficient than clone indices?)Benefits: