You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Fallback automation for the #176 WDDM-spill crawl, designed from two live incidents during the 2026-06-10/11 ERB 500K rebuild on a 12 GB rig:
(1) confluence shard decayed 15.3 -> 2.3 genes/s while dedicated VRAM pinned 11.85/12 GB (recovered only because the shard finished);
(2) slack__eng-oncall collapsed to 64 genes / 47 min (~0.02 genes/s, ~66h projected) with HELIX_DENSE_VRAM_RELEASE_EVERY=64 + expandable_segments ACTIVE — proving empty_cache alone does not un-crawl an already-spilled context. Manual fix that worked: kill + resume (fresh CUDA context; salvage + file-level resume lossless), then BGEM3_DEVICE=cpu for the remainder (GPU 11.9 GB -> 884 MiB, byte-identical vectors).
NOT a wall-clock per-shard timer (false-positives on legitimately large shards; misses crawls on small ones). Trigger on the unambiguous signature: genes/s EMA < shard's own early-batch baseline / HELIX_BFM_CRAWL_FACTOR (default 5) for N consecutive batches AND dedicated VRAM ~full (torch.cuda / pynvml).
Escalation ladder (every rung already exists as shipped machinery):
CPU DEMOTION: second crawl on the same shard -> set BGEM3_DEVICE=cpu for that shard's backfill and continue. Terminal rung; always converges.
Knobs: HELIX_BFM_CRAWL_FACTOR (default 5), HELIX_BFM_CRAWL_WINDOW (batches, default 8), HELIX_BFM_CRAWL_ACTION (ladder|cpu|off). Scope: scripts/build_fixture_matrix.py supervisor loop + rate tracker in _drain_with_batched_splade and the backfill driver; also consider 'CPU backfill by default on <=12 GB rigs' per docs/operations/DENSE_VRAM.md. Update that runbook + this run's evidence when implementing.
Fallback automation for the #176 WDDM-spill crawl, designed from two live incidents during the 2026-06-10/11 ERB 500K rebuild on a 12 GB rig:
(1) confluence shard decayed 15.3 -> 2.3 genes/s while dedicated VRAM pinned 11.85/12 GB (recovered only because the shard finished);
(2) slack__eng-oncall collapsed to 64 genes / 47 min (~0.02 genes/s, ~66h projected) with HELIX_DENSE_VRAM_RELEASE_EVERY=64 + expandable_segments ACTIVE — proving empty_cache alone does not un-crawl an already-spilled context. Manual fix that worked: kill + resume (fresh CUDA context; salvage + file-level resume lossless), then BGEM3_DEVICE=cpu for the remainder (GPU 11.9 GB -> 884 MiB, byte-identical vectors).
NOT a wall-clock per-shard timer (false-positives on legitimately large shards; misses crawls on small ones). Trigger on the unambiguous signature: genes/s EMA < shard's own early-batch baseline / HELIX_BFM_CRAWL_FACTOR (default 5) for N consecutive batches AND dedicated VRAM ~full (torch.cuda / pynvml).
Escalation ladder (every rung already exists as shipped machinery):
Knobs: HELIX_BFM_CRAWL_FACTOR (default 5), HELIX_BFM_CRAWL_WINDOW (batches, default 8), HELIX_BFM_CRAWL_ACTION (ladder|cpu|off). Scope: scripts/build_fixture_matrix.py supervisor loop + rate tracker in _drain_with_batched_splade and the backfill driver; also consider 'CPU backfill by default on <=12 GB rigs' per docs/operations/DENSE_VRAM.md. Update that runbook + this run's evidence when implementing.