Skip to content

fix: stop tracker.db write contention from blocking/dropping usage rows#11

Merged
mr-beaver merged 6 commits into
mr-beaver:mainfrom
MattMencel:fix/sqlite-write-contention
Jun 24, 2026
Merged

fix: stop tracker.db write contention from blocking/dropping usage rows#11
mr-beaver merged 6 commits into
mr-beaver:mainfrom
MattMencel:fix/sqlite-write-contention

Conversation

@MattMencel

Copy link
Copy Markdown
Contributor

Summary

Stops tracker.db writes from ever blocking an agent request and eliminates the
database is locked errors that were silently dropping usage rows.

  • Async accounting write (proxy.py): each request enqueues its usage record
    on a bounded in-process queue (put_nowait, O(1)) and returns; a single daemon
    writer thread persists it via db.save_request. The agent request path never
    waits on SQLite. A bad row is logged and skipped (writer survives); a full queue
    drops the row; graceful shutdown drains the backlog.
  • WAL + busy_timeout helper (db.py): all tracker.db connections route
    through db._connect() (WAL journal mode, busy_timeout=3000, synchronous=NORMAL).
    WAL removes the rollback-journal lock-upgrade deadlock that returned SQLITE_BUSY
    immediately; the busy_timeout lets the off-path writer ride out the importer's batches.
  • Shared connection in the importer (import_history.py): the sync daemon's
    tracker.db write connection now uses the same helper, so proxy and importer
    wait for each other instead of failing.
  • save_request gains an optional ts so the request-time timestamp is preserved
    through the queue.

Dashboard data is now eventually consistent (rows land milliseconds after the
request). Rationale and rejected alternatives recorded in ADR-0002.

Test Plan

  • ./run-tests.sh — 343 passed (332 baseline + 11 new)
  • New db.py tests: WAL mode, busy_timeout=3000, DB_PATH resolved at call time, ts honored
  • New proxy.py tests: enqueue is non-blocking, writer persists rows, bad row skipped & writer survives, queue-full drop, graceful-shutdown drain
  • New importer test: import_history._connect is db._connect

Coordination with #10

This branch is independent of #10 (proxy streaming) — both branch off main.
They overlap textually in proxy.py (the _record call sites vs. its body),
VERSION, RELEASE.md, and tests/test_proxy.py. The overlap is textual, not
semantic: _record's signature is unchanged, so #10's streaming call sites
compose with this branch's enqueueing _record. Whichever merges second resolves
those conflicts and re-bumps VERSION to 1.1.7.

🤖 Generated with Claude Code

MattMencel and others added 6 commits June 23, 2026 12:36
…ill)

Layered fix for 'database is locked' rows being silently dropped:
WAL mode (structural fix for rollback-journal lock-upgrade deadlocks),
exponential backoff on save_request, append-only spill fallback, and
routing the sync daemon through the shared connection helper.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Grilling reframed the goal: the real problem is the accounting write runs
synchronously on the event loop and can block agent requests. Fix is an
in-process queue + single background writer thread (writes never block the
request), plus WAL + busy_timeout=3000 for cross-process contention. Drops
the exponential-backoff loop (redundant with busy_timeout) and the spill
file (over-engineering for eventually-consistent dashboard data).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… thread

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…elper

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mr-beaver mr-beaver merged commit 1b0011c into mr-beaver:main Jun 24, 2026
1 check passed
mr-beaver pushed a commit that referenced this pull request Jun 24, 2026
PR #10 streaming tests read tracker.db immediately after the request.
PR #11 made writes async (queue + background thread), so rows aren't
visible until _process_pending_writes() is called. Add flush before
each sqlite3.connect(tmp_db) in affected tests.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants