Skip to content

fix(colgrep): responsive Ctrl+C + non-resetting build progress while indexing#143

Merged
raphaelsty merged 1 commit into
mainfrom
fix/index-progress-and-interrupt
Jun 17, 2026
Merged

fix(colgrep): responsive Ctrl+C + non-resetting build progress while indexing#143
raphaelsty merged 1 commit into
mainfrom
fix/index-progress-and-interrupt

Conversation

@raphaelsty

Copy link
Copy Markdown
Collaborator

Fixes two indexing UX issues reported on a large fresh build. Neither corrupted the index — both are about progress display and interrupt responsiveness — and crash-safety/resumability is unchanged and verified.

1. Progress bar restarted at 0 every ~4096 units

The resumable build encodes in BUILD_CHECKPOINT_UNITS (4096) batches, each running a fresh pipeline. The metadata stage set the bar via set_position(completed_units) from a per-batch counter starting at 0, so the bar climbed to ~4096 then jumped back. Switched to pb.inc(delta) so the shared bar accumulates across batches to the true total.

2. Ctrl+C appeared to do nothing while indexing

  • No feedback: the acknowledgement only printed inside a critical section, so a Ctrl+C during the long encoding phase was silent. Now the first interrupt is always acknowledged.
  • Slow to act: interruption was only polled at chunk/batch boundaries, so the whole in-flight pipeline drained before stopping (many seconds). The encode stage now bails between chunks, and a new encode_prepared_document_batches_cancellable checks the flag between batches, so a Ctrl+C lands within ~one model forward pass. Measured stop latency: full-queue drain → ~1 s.

The hard floor is one ONNX session.run() (an uninterruptible FFI call); the residual ~1 s is the clean shutdown finalizing in-flight work (k-means seed, the accumulated index write, queued metadata inserts) so the index stays consistent.

Safety / resumability (verified)

Index writes stay inside a CriticalSectionGuard, state.json is checkpointed per batch, and build_resumable trims any partial mid-batch write on resume. Interrupt+resume tested on a 30,000-unit build, interrupted both before and after the first checkpoint:

  • stops in ~0.7–1.5 s with the feedback message,
  • resume completes,
  • final index = exactly 30000 docs, 30000 distinct _subset_ (no duplicates, no loss), and unique-token searches resolve to the correct file (4/4).

make ci-quick (fmt + clippy + 562 colgrep tests) passes.

…indexing

Two indexing UX issues (neither corrupted the index):

1. Progress bar restarted at 0 every ~4096 units. The resumable build encodes in
   BUILD_CHECKPOINT_UNITS batches, each running a fresh pipeline whose metadata
   stage set the bar from a per-batch counter starting at 0 — so it climbed to
   ~4096 then jumped back. Use `pb.inc(delta)` so the shared bar accumulates
   across batches up to the real total.

2. Ctrl+C appeared to do nothing during indexing:
   - The acknowledgement only printed inside a critical section, so an interrupt
     during the long encoding phase was silent. Always acknowledge the first
     interrupt now.
   - Interruption was only polled at chunk/batch boundaries, draining the whole
     in-flight pipeline before stopping. The encode stage now bails between
     chunks, and `encode_prepared_document_batches_cancellable` checks the flag
     between batches so a Ctrl+C lands within ~one model forward pass. Stop
     latency drops from a full-queue drain (many seconds) to ~1s.

Safety/resumability unchanged and verified: index writes stay inside a critical
section, state.json is checkpointed per batch, and build_resumable trims any
partial mid-batch write on resume. Interrupt+resume tested on a 30k-unit build
(interrupted before and after the first checkpoint): resumes to exactly 30000
docs — no duplicates, correct search.
@raphaelsty raphaelsty merged commit 705ba4b into main Jun 17, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant