Skip to content

Add in-memory type index for event prefix queries#716

Open
iamnbutler wants to merge 1 commit intomainfrom
tasks/gh-iamnbutler-tasks-issue-507--5ea4d6da
Open

Add in-memory type index for event prefix queries#716
iamnbutler wants to merge 1 commit intomainfrom
tasks/gh-iamnbutler-tasks-issue-507--5ea4d6da

Conversation

@iamnbutler
Copy link
Copy Markdown
Owner

Summary

  • Adds a lightweight in-memory TypeIndex to EventStore that maps event type strings to (task_id, timestamp) pairs
  • query_by_type_prefix now reads only task files that contain matching events instead of scanning all files — reducing complexity from O(tasks × events) to O(matching_tasks × events_per_task)
  • Index is populated lazily on first query, then kept current by append()

Addressing prior review feedback

Race condition fix: The type_index write lock is held across both the file write and the index update in append(). This makes the file write and index update atomic — lazy population in query_by_type_prefix cannot observe the on-disk event without the corresponding index entry (or vice-versa).

Memory bounded: Index stores only (task_id, timestamp) per entry — no full event/payload cloning. Capped at 2000 entries per event type (~50 types × 2000 × ~50 bytes ≈ 5 MB max), covering typical query limits (100–500).

Compaction safe: Type index is invalidated when compact_task removes events, so the next query re-populates from current disk state.

Test plan

  • All 55 existing + new tests pass (cargo test --package events)
  • Full project compiles cleanly (cargo check)
  • New test: post-population appends are correctly indexed
  • New test: queries skip unrelated task files (only 1 of 20+ tasks read)

Closes #507

🤖 Generated with Claude Code

query_by_type_prefix previously scanned every task file on every call —
O(tasks × events_per_task). For 1000 tasks with 10k events each that's
10M events scanned per query.

Add a lightweight TypeIndex that maps event_type → [(task_id, ts)] and
is populated lazily on first query, then kept current by append().
Queries now only read task files that actually contain matching events.

Key design decisions addressing prior review feedback:

- **No race condition:** The type_index write lock is held across both
  the file write and the index update in append(). This prevents the
  race where lazy population reads a just-written event from disk and
  then append() duplicates it in the index.

- **Memory-bounded:** Index stores only (task_id, timestamp) per entry,
  not full events. Capped at 2000 entries per event type (~50 types ×
  2000 × ~50 bytes ≈ 5 MB), which covers typical query limits (100-500).

- **Compaction-safe:** Type index is invalidated when compact_task
  removes events, so the next query re-populates from current disk state.

Closes #507

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR correctly solves the O(tasks × events) full-scan bottleneck in query_by_type_prefix. The TypeIndex design is clean: lazy population, cap-per-type to bound memory, and compaction invalidation are all handled correctly. All 55 tests pass and clippy is clean.

Two suggestions around lock granularity — neither is a correctness bug, but both are worth considering before this ships in a high-throughput environment.

Lock scope in append() (line 243): The type_index write lock is held across the full file I/O (dir create, open, write, flush). The reason given is to prevent a duplicate index entry if ensure_type_index_populated() races in between the file write and the index update. But task_ids_for_prefix() already deduplicates task IDs, so a duplicate (task_id, ts) entry is harmless. Narrowing the lock to just the index.insert() call would eliminate write-path contention for reads.

Cold-start population holds write lock for full scan (line 393): ensure_type_index_populated() holds the write lock while reading every task file. This serializes all concurrent append() calls for the duration — a potentially multi-second stall on first query with a large store. Reading into a local buffer without the lock (then briefly locking to merge) would avoid this at the cost of the same benign duplicate.

The implementation is otherwise solid. The double-check pattern in ensure_type_index_populated() is correct, compact_task() properly invalidates on compaction, and the new tests cover the key behavioral invariants (post-population appends, unrelated task file skipping).


References:

Reviewed by PR / Review

Comment on lines 243 to 256
@@ -178,6 +255,10 @@ impl EventStore {
file.write_all(line.as_bytes()).await?;
file.flush().await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] Priority: Code Quality

The type_index write lock is held across create_dir_all, open, write_all, and flush (lines 243–256). The stated goal is to prevent ensure_type_index_populated() from reading the newly-written event off disk before append() has a chance to add it to the index, which would result in a duplicate (task_id, timestamp) entry.

But task_ids_for_prefix() already calls .dedup() on the deduplicated task ID list, so a duplicate entry in the vec is harmless — it just costs one extra element before the cap evicts it. The only real invariant that needs protection is: an append to a populated index mustn't be silently lost. That invariant is already satisfied by if index.populated { index.insert(...) } as long as the check and the insert are inside the same lock scope.

Consider narrowing the lock scope to just the index.insert() call:

let _guard = lock.lock().await;
// file I/O (no index lock held)
fs::create_dir_all(parent).await?;
let mut file = OpenOptions::new().create(true).append(true).open(&path).await?;
file.write_all(line.as_bytes()).await?;
file.flush().await?;

// Brief write lock only for the index update.
let mut index = self.type_index.write().await;
if index.populated {
    index.insert(event.event_type.as_str(), &event.task, event.ts);
}

As written, every append serializes all concurrent query_by_type_prefix read-lock acquisitions while doing disk I/O. In an agent workload where events are appended frequently, this adds latency to orchestrator queries.

Comment on lines +393 to +405
let mut index = self.type_index.write().await;
// Double-check after acquiring the write lock.
if index.populated {
return Ok(());
}

let task_ids = self.list_tasks().await?;
for task_id in &task_ids {
let events = self.read_task(task_id).await?;
for event in &events {
index.insert(event.event_type.as_str(), &event.task, event.ts);
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[SUGGESTION] Priority: Correctness / Performance

The write lock is held for the entire cold-start scan — list_tasks() plus one read_task() per task (lines 393–405). While this is documented as a "one-time cost at startup," in practice it blocks every concurrent append() call for the full scan duration, because append() acquires the per-task lock first and then waits for the type_index write lock (line 243). With 1000+ tasks that each have thousands of events, this could cause a multi-second write stall on the first orchestrator query.

A narrower approach: read all tasks into a local buffer without holding the lock, then briefly acquire the write lock to merge and set populated = true. Any events appended during the scan will either be caught by the merge or will be added by append() immediately after (since they hold the write lock before writing). The worst case is a duplicate entry — benign as noted above.

If changing the locking is out of scope for this PR, at minimum consider logging a warning (rather than just debug!) when the scan takes longer than a threshold, so operators can see cold-start stalls in production.

@iamnbutler
Copy link
Copy Markdown
Owner Author

Orchestrator Evaluation: Rejected

  1. CI is FAILING — metadata explicitly states CI status is failing, which is a hard blocker.

  2. Correctness regression from eviction: When TYPE_INDEX_CAP_PER_TYPE (2000) is exceeded for a type, the oldest half of entries are drained. After eviction, task_ids_for_prefix will not return task IDs that were evicted, meaning query_by_type_prefix will return incomplete results compared to the original full-scan implementation. This is a correctness bug, not just a performance tradeoff.

  3. Limit-based task ID selection is semantically wrong: task_ids_for_prefix truncates index entries to limit before extracting unique task IDs. But limit refers to the final number of events, not task IDs. A task with many matching events could consume most of the limit slots in the index, causing other tasks' events to be excluded from the final result set incorrectly.

  4. Still O(matched_tasks × events_per_task): For each matched task, ALL events are read and deserialized from the file, then filtered. The optimization only helps when many tasks have no matching events at all. The issue description's scenario (1000 tasks × 10K events) would still be expensive if many tasks match the prefix.


Feedback for agent:
Please address the following before merging:

  1. CI must pass before this can be approved.

  2. Eviction causes incorrect results: When the cap per type is exceeded and entries are evicted, subsequent queries will miss events from evicted tasks. The original implementation returned correct (complete) results; this is a correctness regression. Consider either: (a) falling back to full scan when eviction has occurred, (b) tracking that eviction happened and marking the index as incomplete for that type, or (c) using a different bounding strategy that doesn't lose correctness.

  3. task_ids_for_prefix limit semantics are wrong: Using the event limit to truncate index entries before extracting task IDs conflates 'number of events to return' with 'number of index entries to scan'. This can produce wrong results when one task has many matching events. Consider returning all matching task IDs from the index (which is already bounded by the cap) rather than truncating by the query limit.

  4. Minor: The per-task full file read means the optimization only helps when most tasks don't match. Consider documenting this limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

query_by_type_prefix scans all task event files — O(n*m) on every call

1 participant