fix: stop tracker.db write contention from blocking/dropping usage rows#11
Merged
mr-beaver merged 6 commits intoJun 24, 2026
Merged
Conversation
…ill) Layered fix for 'database is locked' rows being silently dropped: WAL mode (structural fix for rollback-journal lock-upgrade deadlocks), exponential backoff on save_request, append-only spill fallback, and routing the sync daemon through the shared connection helper. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Grilling reframed the goal: the real problem is the accounting write runs synchronously on the event loop and can block agent requests. Fix is an in-process queue + single background writer thread (writes never block the request), plus WAL + busy_timeout=3000 for cross-process contention. Drops the exponential-backoff loop (redundant with busy_timeout) and the spill file (over-engineering for eventually-consistent dashboard data). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… thread Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…elper Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
mr-beaver
pushed a commit
that referenced
this pull request
Jun 24, 2026
PR #10 streaming tests read tracker.db immediately after the request. PR #11 made writes async (queue + background thread), so rows aren't visible until _process_pending_writes() is called. Add flush before each sqlite3.connect(tmp_db) in affected tests. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stops
tracker.dbwrites from ever blocking an agent request and eliminates thedatabase is lockederrors that were silently dropping usage rows.proxy.py): each request enqueues its usage recordon a bounded in-process queue (
put_nowait, O(1)) and returns; a single daemonwriter thread persists it via
db.save_request. The agent request path neverwaits on SQLite. A bad row is logged and skipped (writer survives); a full queue
drops the row; graceful shutdown drains the backlog.
db.py): alltracker.dbconnections routethrough
db._connect()(WAL journal mode,busy_timeout=3000,synchronous=NORMAL).WAL removes the rollback-journal lock-upgrade deadlock that returned
SQLITE_BUSYimmediately; the busy_timeout lets the off-path writer ride out the importer's batches.
import_history.py): the sync daemon'stracker.dbwrite connection now uses the same helper, so proxy and importerwait for each other instead of failing.
save_requestgains an optionaltsso the request-time timestamp is preservedthrough the queue.
Dashboard data is now eventually consistent (rows land milliseconds after the
request). Rationale and rejected alternatives recorded in ADR-0002.
Test Plan
./run-tests.sh— 343 passed (332 baseline + 11 new)db.pytests: WAL mode, busy_timeout=3000, DB_PATH resolved at call time,tshonoredproxy.pytests: enqueue is non-blocking, writer persists rows, bad row skipped & writer survives, queue-full drop, graceful-shutdown drainimport_history._connect is db._connectCoordination with #10
This branch is independent of #10 (proxy streaming) — both branch off
main.They overlap textually in
proxy.py(the_recordcall sites vs. its body),VERSION,RELEASE.md, andtests/test_proxy.py. The overlap is textual, notsemantic:
_record's signature is unchanged, so #10's streaming call sitescompose with this branch's enqueueing
_record. Whichever merges second resolvesthose conflicts and re-bumps
VERSIONto 1.1.7.🤖 Generated with Claude Code