Skip to content

fix: rebuild crashes with "cannot start a transaction within a transaction"#181

Closed
realkotob wants to merge 1 commit intotirth8205:mainfrom
realkotob:fix/sqlite-nested-transaction-purge
Closed

fix: rebuild crashes with "cannot start a transaction within a transaction"#181
realkotob wants to merge 1 commit intotirth8205:mainfrom
realkotob:fix/sqlite-nested-transaction-purge

Conversation

@realkotob
Copy link
Copy Markdown
Contributor

@realkotob realkotob commented Apr 9, 2026

The problem

Running code-review-graph build on a large repo (~9k files) after an earlier interrupted build crashes almost immediately with:

sqlite3.OperationalError: cannot start a transaction within a transaction

I hit this on a big source tree on my Macbook M1 Pro. The first build ran through the parse phase to completion, then I Ctrl-C'd it while community detection was stuck (separate perf issue — the file-based fallback is quadratic on large graphs). On the retry, the rebuild blew up on the very first file:

INFO: Progress: ... files parsed ... (never gets here)
Traceback (most recent call last):
  ...
  File ".../incremental.py", line 397, in full_build
      store.store_file_nodes_edges(...)
  File ".../graph.py", line 239, in store_file_nodes_edges
      self._conn.execute("BEGIN IMMEDIATE")
sqlite3.OperationalError: cannot start a transaction within a transaction

Why it happens

full_build starts by purging any files that used to exist in the DB but no longer exist on disk:

for stale in existing_files - current_abs:
    store.remove_file_data(stale)

remove_file_data issues two DELETE statements and returns without committing. Under Python's default (legacy) sqlite3 isolation mode, those DELETEs silently open an implicit transaction. The very next store_file_nodes_edges() call then runs an explicit BEGIN IMMEDIATE — and SQLite rejects it because one is already open.

The reason this went undetected is that the purge loop is usually a no-op: if every file still exists on disk, existing_files - current_abs is empty and the buggy path never runs. My interrupted build had left the DB with thousands of file rows whose absolute paths no longer matched the set the rebuild was computing, so every single one was treated as stale and the connection was left dirty before parsing even started.

The fix

Two parts:

  1. incremental.pyfull_build now calls store.commit() once after draining the purge loop, so the parse phase starts with a clean connection.
  2. graph.pystore_file_nodes_edges checks conn.in_transaction and flushes any dangling implicit transaction before its BEGIN IMMEDIATE. Defensive: if any future code path leaves DML uncommitted, the atomic per-file replace still works instead of crashing the build.

Test plan

  • New regression test test_store_file_nodes_edges_after_dangling_dml primes an implicit transaction via remove_file_data and asserts store_file_nodes_edges still succeeds. Fails on main, passes with the fix.
  • uv run pytest tests/test_graph.py tests/test_incremental.py — 45 passed
  • uv run pytest tests/test_tools.py tests/test_integration_v2.py — 66 passed
  • Manually reproduced the crash on a real repo and confirmed the fix allows the rebuild to proceed past the purge loop.

…ction"

Running `code-review-graph build` on a large repo (~9k files) after an
earlier interrupted build would crash almost immediately with:

    sqlite3.OperationalError: cannot start a transaction within a
    transaction

I hit this on the Godot source tree. The first build ran to completion
on the parse phase, then I Ctrl-C'd it while community detection was
stuck (separate perf issue). On the retry, the rebuild blew up on the
very first file.

Why it happens:

`full_build` starts by purging any files that used to exist in the DB
but no longer exist on disk. The purge loop calls
`store.remove_file_data()` for each stale path, which issues two
`DELETE` statements and returns without committing. Under Python's
default (legacy) sqlite3 isolation mode, those DELETEs silently open an
implicit transaction. The next `store_file_nodes_edges()` call then runs
an explicit `BEGIN IMMEDIATE` — and SQLite rejects it because one is
already open.

The reason nobody had tripped this before is that the purge loop is
usually a no-op: if every file still exists, `existing_files -
current_abs` is empty. But my interrupted build had left the DB with
thousands of file rows whose absolute paths no longer matched the set
the rebuild was computing, so every single one hit the purge path and
the connection was left dirty before parsing even started.

Fix is two parts:

  1. `full_build` now calls `store.commit()` once after the purge loop,
     so the parse phase starts with a clean connection.
  2. `store_file_nodes_edges` checks `conn.in_transaction` and flushes
     any dangling implicit transaction before its `BEGIN IMMEDIATE`.
     Defensive: if any future code path leaves DML uncommitted, the
     atomic per-file replace still works instead of crashing the build.

Adds a regression test that primes an implicit transaction via
`remove_file_data` and then asserts `store_file_nodes_edges` still
succeeds.
ghiemer added a commit to ghiemer/code-review-graph that referenced this pull request Apr 10, 2026
…205#135, tirth8205#181)

Python's sqlite3 with default isolation_level="" silently opens a
DEFERRED transaction on the first DML statement. When remove_file_data()
is called for stale/deleted files, the DELETE statements open an implicit
transaction that is never committed. The subsequent
store_file_nodes_edges() then executes BEGIN IMMEDIATE inside that
already-open transaction, causing:

  sqlite3.OperationalError: cannot start a transaction within a transaction

This affects both full_build (stale-file purge loop) and
incremental_update (deleted-file handling).

Fix:
- graph.py: Add in_transaction guard before BEGIN IMMEDIATE in
  store_file_nodes_edges() to flush any pending implicit transaction
- graph.py: Add public rollback() method to GraphStore
- incremental.py: Commit after stale-file purge loop in full_build()
- incremental.py: Commit after deleted-file removal in incremental_update()

Tests:
- test_store_after_remove_no_transaction_error (single remove → store)
- test_store_after_multiple_removes_no_transaction_error (bulk removes → store)

Closes tirth8205#135, tirth8205#181, relates to tirth8205#162, tirth8205#193, tirth8205#94

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tirth8205 pushed a commit that referenced this pull request Apr 11, 2026
* fix: flush implicit SQLite transaction before BEGIN IMMEDIATE (#135, #181)

Python's sqlite3 with default isolation_level="" silently opens a
DEFERRED transaction on the first DML statement. When remove_file_data()
is called for stale/deleted files, the DELETE statements open an implicit
transaction that is never committed. The subsequent
store_file_nodes_edges() then executes BEGIN IMMEDIATE inside that
already-open transaction, causing:

  sqlite3.OperationalError: cannot start a transaction within a transaction

This affects both full_build (stale-file purge loop) and
incremental_update (deleted-file handling).

Fix:
- graph.py: Add in_transaction guard before BEGIN IMMEDIATE in
  store_file_nodes_edges() to flush any pending implicit transaction
- graph.py: Add public rollback() method to GraphStore
- incremental.py: Commit after stale-file purge loop in full_build()
- incremental.py: Commit after deleted-file removal in incremental_update()

Tests:
- test_store_after_remove_no_transaction_error (single remove → store)
- test_store_after_multiple_removes_no_transaction_error (bulk removes → store)

Closes #135, #181, relates to #162, #193, #94

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address root cause — set isolation_level=None, atomic summaries

The in_transaction guard from the previous commit addresses the symptom.
This commit fixes the root cause: Python's sqlite3 default
isolation_level="" silently opens implicit DEFERRED transactions on
every DML statement. Setting isolation_level=None disables this legacy
behavior entirely, making all transaction management explicit.

Changes:
- graph.py: Set isolation_level=None in sqlite3.connect() to disable
  implicit transactions — this is the actual root-cause fix
- graph.py: Downgrade in_transaction guard log from debug to warning
  (it should never fire now; if it does, something is wrong)
- tools/build.py: Wrap each _compute_summaries block in explicit
  BEGIN IMMEDIATE / COMMIT / ROLLBACK — previously these DELETE+INSERT
  sequences ran without transaction protection, risking partial writes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: improve comments — explain why, not what

- incremental.py: Reword commit-after-delete comments to explain the
  reason (avoiding nested transactions) instead of describing the
  mechanism (flushing implicit transactions)
- test_graph.py: Reword test comments to explain the scenario being
  guarded against rather than implementation details that changed
  with the isolation_level=None fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: apply isolation_level=None to all connection sites, atomic batch ops

Audit of the entire codebase found two additional sqlite3.connect()
calls that were still using the default isolation_level="" (implicit
transactions), plus two batch-write functions without transaction
protection.

Connections:
- registry.py: ConnectionPool.get() now passes isolation_level=None
  to prevent implicit transactions on pooled connections
- embeddings.py: EmbeddingStore.__init__ now passes isolation_level=None
  to match GraphStore behavior

Atomic batch operations:
- flows.py: store_flows() wrapped in BEGIN IMMEDIATE / COMMIT / ROLLBACK
  to prevent partial flow data if an exception interrupts the loop
- communities.py: store_communities() wrapped in BEGIN IMMEDIATE / COMMIT /
  ROLLBACK to prevent partial community data on crash

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update uv.lock to match pyproject.toml v2.2.2

Lock file was out of sync — still referenced v2.0.0 while
pyproject.toml already declared v2.2.2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tirth8205
Copy link
Copy Markdown
Owner

This was closed in favor of #205 which was just merged and provides a more complete fix.

Your PR correctly identifies the two-part fix: commit after the stale-file purge loop in full_build(), and an in_transaction guard in store_file_nodes_edges(). Both of those changes are also present in #205.

However, PR #205 goes further:

  1. Root cause elimination: sets isolation_level=None on all sqlite3.connect() calls in graph.py, registry.py, and embeddings.py — so implicit transactions can never leak again.
  2. Incremental path too: adds the corresponding store.commit() after the deleted-file loop in incremental_update().
  3. Atomicity: wraps store_flows(), store_communities(), and all three _compute_summaries blocks in explicit transactions.
  4. More regression tests: 2 new tests covering both single and bulk remove-then-store scenarios.

Your root-cause analysis was excellent and helped clarify the problem for everyone. Thank you for the contribution.

tirth8205 added a commit that referenced this pull request Apr 11, 2026
Unreleased fixes since v2.2.2 that users are complaining about:
- #208 Claude Code hook schema (fixes #97, #138, #163, #168, #172, #182,
  #188, #191, #201) — v2.2.2 generates an invalid hooks schema and
  timeouts in ms instead of seconds; PreCommit is also not a real event.
- #205 SQLite transaction nesting (fixes #110, #135, #181) — implicit
  transactions from the legacy sqlite3 default caused "cannot start a
  transaction within a transaction" on update.
- #166 Go method receivers resolved from field_identifier.
- #170 UTF-8 decode errors in detect_changes (fixes #169).
- #142 --platform target filters (fixes #133).
- #213 / #183 large-repo community detection hangs.
- #220 CI lint + tomllib on Python 3.10.
- #159 missing pytest-cov dev dep.
- #154 JSX component CALLS edges.

Plus features: #177 Codex, #165 Luau (#153), #217 REFERENCES edge,
#215 recurse_submodules, #185 gitignore default (#175), #171 gitignore
docs (#157).

Verified locally on Python 3.11: ruff clean, mypy clean, bandit clean,
691 tests pass, coverage 73.72%.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tirth8205 added a commit that referenced this pull request Apr 11, 2026
Unreleased fixes since v2.2.2 that users are complaining about:
- #208 Claude Code hook schema (fixes #97, #138, #163, #168, #172, #182,
  #188, #191, #201) — v2.2.2 generates an invalid hooks schema and
  timeouts in ms instead of seconds; PreCommit is also not a real event.
- #205 SQLite transaction nesting (fixes #110, #135, #181) — implicit
  transactions from the legacy sqlite3 default caused "cannot start a
  transaction within a transaction" on update.
- #166 Go method receivers resolved from field_identifier.
- #170 UTF-8 decode errors in detect_changes (fixes #169).
- #142 --platform target filters (fixes #133).
- #213 / #183 large-repo community detection hangs.
- #220 CI lint + tomllib on Python 3.10.
- #159 missing pytest-cov dev dep.
- #154 JSX component CALLS edges.

Plus features: #177 Codex, #165 Luau (#153), #217 REFERENCES edge,
#215 recurse_submodules, #185 gitignore default (#175), #171 gitignore
docs (#157).

Verified locally on Python 3.11: ruff clean, mypy clean, bandit clean,
691 tests pass, coverage 73.72%.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants