fix: rebuild crashes with "cannot start a transaction within a transaction"#181
Closed
realkotob wants to merge 1 commit intotirth8205:mainfrom
Closed
fix: rebuild crashes with "cannot start a transaction within a transaction"#181realkotob wants to merge 1 commit intotirth8205:mainfrom
realkotob wants to merge 1 commit intotirth8205:mainfrom
Conversation
…ction"
Running `code-review-graph build` on a large repo (~9k files) after an
earlier interrupted build would crash almost immediately with:
sqlite3.OperationalError: cannot start a transaction within a
transaction
I hit this on the Godot source tree. The first build ran to completion
on the parse phase, then I Ctrl-C'd it while community detection was
stuck (separate perf issue). On the retry, the rebuild blew up on the
very first file.
Why it happens:
`full_build` starts by purging any files that used to exist in the DB
but no longer exist on disk. The purge loop calls
`store.remove_file_data()` for each stale path, which issues two
`DELETE` statements and returns without committing. Under Python's
default (legacy) sqlite3 isolation mode, those DELETEs silently open an
implicit transaction. The next `store_file_nodes_edges()` call then runs
an explicit `BEGIN IMMEDIATE` — and SQLite rejects it because one is
already open.
The reason nobody had tripped this before is that the purge loop is
usually a no-op: if every file still exists, `existing_files -
current_abs` is empty. But my interrupted build had left the DB with
thousands of file rows whose absolute paths no longer matched the set
the rebuild was computing, so every single one hit the purge path and
the connection was left dirty before parsing even started.
Fix is two parts:
1. `full_build` now calls `store.commit()` once after the purge loop,
so the parse phase starts with a clean connection.
2. `store_file_nodes_edges` checks `conn.in_transaction` and flushes
any dangling implicit transaction before its `BEGIN IMMEDIATE`.
Defensive: if any future code path leaves DML uncommitted, the
atomic per-file replace still works instead of crashing the build.
Adds a regression test that primes an implicit transaction via
`remove_file_data` and then asserts `store_file_nodes_edges` still
succeeds.
ghiemer
added a commit
to ghiemer/code-review-graph
that referenced
this pull request
Apr 10, 2026
…205#135, tirth8205#181) Python's sqlite3 with default isolation_level="" silently opens a DEFERRED transaction on the first DML statement. When remove_file_data() is called for stale/deleted files, the DELETE statements open an implicit transaction that is never committed. The subsequent store_file_nodes_edges() then executes BEGIN IMMEDIATE inside that already-open transaction, causing: sqlite3.OperationalError: cannot start a transaction within a transaction This affects both full_build (stale-file purge loop) and incremental_update (deleted-file handling). Fix: - graph.py: Add in_transaction guard before BEGIN IMMEDIATE in store_file_nodes_edges() to flush any pending implicit transaction - graph.py: Add public rollback() method to GraphStore - incremental.py: Commit after stale-file purge loop in full_build() - incremental.py: Commit after deleted-file removal in incremental_update() Tests: - test_store_after_remove_no_transaction_error (single remove → store) - test_store_after_multiple_removes_no_transaction_error (bulk removes → store) Closes tirth8205#135, tirth8205#181, relates to tirth8205#162, tirth8205#193, tirth8205#94 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
4 tasks
tirth8205
pushed a commit
that referenced
this pull request
Apr 11, 2026
* fix: flush implicit SQLite transaction before BEGIN IMMEDIATE (#135, #181) Python's sqlite3 with default isolation_level="" silently opens a DEFERRED transaction on the first DML statement. When remove_file_data() is called for stale/deleted files, the DELETE statements open an implicit transaction that is never committed. The subsequent store_file_nodes_edges() then executes BEGIN IMMEDIATE inside that already-open transaction, causing: sqlite3.OperationalError: cannot start a transaction within a transaction This affects both full_build (stale-file purge loop) and incremental_update (deleted-file handling). Fix: - graph.py: Add in_transaction guard before BEGIN IMMEDIATE in store_file_nodes_edges() to flush any pending implicit transaction - graph.py: Add public rollback() method to GraphStore - incremental.py: Commit after stale-file purge loop in full_build() - incremental.py: Commit after deleted-file removal in incremental_update() Tests: - test_store_after_remove_no_transaction_error (single remove → store) - test_store_after_multiple_removes_no_transaction_error (bulk removes → store) Closes #135, #181, relates to #162, #193, #94 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address root cause — set isolation_level=None, atomic summaries The in_transaction guard from the previous commit addresses the symptom. This commit fixes the root cause: Python's sqlite3 default isolation_level="" silently opens implicit DEFERRED transactions on every DML statement. Setting isolation_level=None disables this legacy behavior entirely, making all transaction management explicit. Changes: - graph.py: Set isolation_level=None in sqlite3.connect() to disable implicit transactions — this is the actual root-cause fix - graph.py: Downgrade in_transaction guard log from debug to warning (it should never fire now; if it does, something is wrong) - tools/build.py: Wrap each _compute_summaries block in explicit BEGIN IMMEDIATE / COMMIT / ROLLBACK — previously these DELETE+INSERT sequences ran without transaction protection, risking partial writes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: improve comments — explain why, not what - incremental.py: Reword commit-after-delete comments to explain the reason (avoiding nested transactions) instead of describing the mechanism (flushing implicit transactions) - test_graph.py: Reword test comments to explain the scenario being guarded against rather than implementation details that changed with the isolation_level=None fix Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: apply isolation_level=None to all connection sites, atomic batch ops Audit of the entire codebase found two additional sqlite3.connect() calls that were still using the default isolation_level="" (implicit transactions), plus two batch-write functions without transaction protection. Connections: - registry.py: ConnectionPool.get() now passes isolation_level=None to prevent implicit transactions on pooled connections - embeddings.py: EmbeddingStore.__init__ now passes isolation_level=None to match GraphStore behavior Atomic batch operations: - flows.py: store_flows() wrapped in BEGIN IMMEDIATE / COMMIT / ROLLBACK to prevent partial flow data if an exception interrupts the loop - communities.py: store_communities() wrapped in BEGIN IMMEDIATE / COMMIT / ROLLBACK to prevent partial community data on crash Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update uv.lock to match pyproject.toml v2.2.2 Lock file was out of sync — still referenced v2.0.0 while pyproject.toml already declared v2.2.2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Owner
|
This was closed in favor of #205 which was just merged and provides a more complete fix. Your PR correctly identifies the two-part fix: commit after the stale-file purge loop in However, PR #205 goes further:
Your root-cause analysis was excellent and helped clarify the problem for everyone. Thank you for the contribution. |
tirth8205
added a commit
that referenced
this pull request
Apr 11, 2026
Unreleased fixes since v2.2.2 that users are complaining about: - #208 Claude Code hook schema (fixes #97, #138, #163, #168, #172, #182, #188, #191, #201) — v2.2.2 generates an invalid hooks schema and timeouts in ms instead of seconds; PreCommit is also not a real event. - #205 SQLite transaction nesting (fixes #110, #135, #181) — implicit transactions from the legacy sqlite3 default caused "cannot start a transaction within a transaction" on update. - #166 Go method receivers resolved from field_identifier. - #170 UTF-8 decode errors in detect_changes (fixes #169). - #142 --platform target filters (fixes #133). - #213 / #183 large-repo community detection hangs. - #220 CI lint + tomllib on Python 3.10. - #159 missing pytest-cov dev dep. - #154 JSX component CALLS edges. Plus features: #177 Codex, #165 Luau (#153), #217 REFERENCES edge, #215 recurse_submodules, #185 gitignore default (#175), #171 gitignore docs (#157). Verified locally on Python 3.11: ruff clean, mypy clean, bandit clean, 691 tests pass, coverage 73.72%. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tirth8205
added a commit
that referenced
this pull request
Apr 11, 2026
Unreleased fixes since v2.2.2 that users are complaining about: - #208 Claude Code hook schema (fixes #97, #138, #163, #168, #172, #182, #188, #191, #201) — v2.2.2 generates an invalid hooks schema and timeouts in ms instead of seconds; PreCommit is also not a real event. - #205 SQLite transaction nesting (fixes #110, #135, #181) — implicit transactions from the legacy sqlite3 default caused "cannot start a transaction within a transaction" on update. - #166 Go method receivers resolved from field_identifier. - #170 UTF-8 decode errors in detect_changes (fixes #169). - #142 --platform target filters (fixes #133). - #213 / #183 large-repo community detection hangs. - #220 CI lint + tomllib on Python 3.10. - #159 missing pytest-cov dev dep. - #154 JSX component CALLS edges. Plus features: #177 Codex, #165 Luau (#153), #217 REFERENCES edge, #215 recurse_submodules, #185 gitignore default (#175), #171 gitignore docs (#157). Verified locally on Python 3.11: ruff clean, mypy clean, bandit clean, 691 tests pass, coverage 73.72%. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The problem
Running
code-review-graph buildon a large repo (~9k files) after an earlier interrupted build crashes almost immediately with:I hit this on a big source tree on my Macbook M1 Pro. The first build ran through the parse phase to completion, then I Ctrl-C'd it while community detection was stuck (separate perf issue — the file-based fallback is quadratic on large graphs). On the retry, the rebuild blew up on the very first file:
Why it happens
full_buildstarts by purging any files that used to exist in the DB but no longer exist on disk:remove_file_dataissues twoDELETEstatements and returns without committing. Under Python's default (legacy)sqlite3isolation mode, thoseDELETEs silently open an implicit transaction. The very nextstore_file_nodes_edges()call then runs an explicitBEGIN IMMEDIATE— and SQLite rejects it because one is already open.The reason this went undetected is that the purge loop is usually a no-op: if every file still exists on disk,
existing_files - current_absis empty and the buggy path never runs. My interrupted build had left the DB with thousands of file rows whose absolute paths no longer matched the set the rebuild was computing, so every single one was treated as stale and the connection was left dirty before parsing even started.The fix
Two parts:
incremental.py—full_buildnow callsstore.commit()once after draining the purge loop, so the parse phase starts with a clean connection.graph.py—store_file_nodes_edgeschecksconn.in_transactionand flushes any dangling implicit transaction before itsBEGIN IMMEDIATE. Defensive: if any future code path leaves DML uncommitted, the atomic per-file replace still works instead of crashing the build.Test plan
test_store_file_nodes_edges_after_dangling_dmlprimes an implicit transaction viaremove_file_dataand assertsstore_file_nodes_edgesstill succeeds. Fails onmain, passes with the fix.uv run pytest tests/test_graph.py tests/test_incremental.py— 45 passeduv run pytest tests/test_tools.py tests/test_integration_v2.py— 66 passed