Skip to content

Handle nested SQLite transactions during builds#193

Closed
GrimoireScribe wants to merge 1 commit intotirth8205:mainfrom
GrimoireScribe:fix/nested-build-transactions
Closed

Handle nested SQLite transactions during builds#193
GrimoireScribe wants to merge 1 commit intotirth8205:mainfrom
GrimoireScribe:fix/nested-build-transactions

Conversation

@GrimoireScribe
Copy link
Copy Markdown
Contributor

Summary

  • make store_file_nodes_edges() safe when the SQLite connection is already inside a transaction
  • use a savepoint for nested writes instead of always issuing BEGIN IMMEDIATE
  • add a regression test covering file-store writes during an existing transaction

Why

On Windows, rebuilding the graph could fail with:

ext sqlite3.OperationalError: cannot start a transaction within a transaction

This happens when store_file_nodes_edges() is called while the connection is already in a transaction. Using a savepoint preserves the atomic per-file replace behavior without assuming top-level transaction ownership.

Verification

  • python -m ruff check code_review_graph/graph.py tests/test_graph.py
  • local CLI build completed successfully against a real repo after the change

ghiemer added a commit to ghiemer/code-review-graph that referenced this pull request Apr 10, 2026
…205#135, tirth8205#181)

Python's sqlite3 with default isolation_level="" silently opens a
DEFERRED transaction on the first DML statement. When remove_file_data()
is called for stale/deleted files, the DELETE statements open an implicit
transaction that is never committed. The subsequent
store_file_nodes_edges() then executes BEGIN IMMEDIATE inside that
already-open transaction, causing:

  sqlite3.OperationalError: cannot start a transaction within a transaction

This affects both full_build (stale-file purge loop) and
incremental_update (deleted-file handling).

Fix:
- graph.py: Add in_transaction guard before BEGIN IMMEDIATE in
  store_file_nodes_edges() to flush any pending implicit transaction
- graph.py: Add public rollback() method to GraphStore
- incremental.py: Commit after stale-file purge loop in full_build()
- incremental.py: Commit after deleted-file removal in incremental_update()

Tests:
- test_store_after_remove_no_transaction_error (single remove → store)
- test_store_after_multiple_removes_no_transaction_error (bulk removes → store)

Closes tirth8205#135, tirth8205#181, relates to tirth8205#162, tirth8205#193, tirth8205#94

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
tirth8205 pushed a commit that referenced this pull request Apr 11, 2026
* fix: flush implicit SQLite transaction before BEGIN IMMEDIATE (#135, #181)

Python's sqlite3 with default isolation_level="" silently opens a
DEFERRED transaction on the first DML statement. When remove_file_data()
is called for stale/deleted files, the DELETE statements open an implicit
transaction that is never committed. The subsequent
store_file_nodes_edges() then executes BEGIN IMMEDIATE inside that
already-open transaction, causing:

  sqlite3.OperationalError: cannot start a transaction within a transaction

This affects both full_build (stale-file purge loop) and
incremental_update (deleted-file handling).

Fix:
- graph.py: Add in_transaction guard before BEGIN IMMEDIATE in
  store_file_nodes_edges() to flush any pending implicit transaction
- graph.py: Add public rollback() method to GraphStore
- incremental.py: Commit after stale-file purge loop in full_build()
- incremental.py: Commit after deleted-file removal in incremental_update()

Tests:
- test_store_after_remove_no_transaction_error (single remove → store)
- test_store_after_multiple_removes_no_transaction_error (bulk removes → store)

Closes #135, #181, relates to #162, #193, #94

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address root cause — set isolation_level=None, atomic summaries

The in_transaction guard from the previous commit addresses the symptom.
This commit fixes the root cause: Python's sqlite3 default
isolation_level="" silently opens implicit DEFERRED transactions on
every DML statement. Setting isolation_level=None disables this legacy
behavior entirely, making all transaction management explicit.

Changes:
- graph.py: Set isolation_level=None in sqlite3.connect() to disable
  implicit transactions — this is the actual root-cause fix
- graph.py: Downgrade in_transaction guard log from debug to warning
  (it should never fire now; if it does, something is wrong)
- tools/build.py: Wrap each _compute_summaries block in explicit
  BEGIN IMMEDIATE / COMMIT / ROLLBACK — previously these DELETE+INSERT
  sequences ran without transaction protection, risking partial writes

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: improve comments — explain why, not what

- incremental.py: Reword commit-after-delete comments to explain the
  reason (avoiding nested transactions) instead of describing the
  mechanism (flushing implicit transactions)
- test_graph.py: Reword test comments to explain the scenario being
  guarded against rather than implementation details that changed
  with the isolation_level=None fix

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: apply isolation_level=None to all connection sites, atomic batch ops

Audit of the entire codebase found two additional sqlite3.connect()
calls that were still using the default isolation_level="" (implicit
transactions), plus two batch-write functions without transaction
protection.

Connections:
- registry.py: ConnectionPool.get() now passes isolation_level=None
  to prevent implicit transactions on pooled connections
- embeddings.py: EmbeddingStore.__init__ now passes isolation_level=None
  to match GraphStore behavior

Atomic batch operations:
- flows.py: store_flows() wrapped in BEGIN IMMEDIATE / COMMIT / ROLLBACK
  to prevent partial flow data if an exception interrupts the loop
- communities.py: store_communities() wrapped in BEGIN IMMEDIATE / COMMIT /
  ROLLBACK to prevent partial community data on crash

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: update uv.lock to match pyproject.toml v2.2.2

Lock file was out of sync — still referenced v2.0.0 while
pyproject.toml already declared v2.2.2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@tirth8205
Copy link
Copy Markdown
Owner

Closing in favor of #205 which provides a more complete fix.

Your savepoint approach is technically valid and clever — it preserves atomic per-file replace semantics when the connection is already in a transaction. However, it is a symptom-level fix: it works around the implicit transaction left open by remove_file_data() without removing the underlying source of that transaction.

PR #205 goes further in three ways:

  1. Root cause fix: sets isolation_level=None on all three sqlite3.connect() calls (graph.py, registry.py, embeddings.py), disabling implicit transactions project-wide so the leaking transaction can never occur in the first place.
  2. Completeness: also wraps store_flows(), store_communities(), and the three _compute_summaries blocks in explicit transactions for atomicity — sites that your PR leaves untouched.
  3. Incremental path: adds store.commit() after the stale-file purge loop in full_build() and the deleted-file loop in incremental_update(), so the fix is also applied at the call-site layer.

Additionally, the nosec B608 annotation on the f-string SAVEPOINT names is a code smell — even though the savepoint name is a hardcoded literal, the annotation suggests it needs suppression. The savepoint name could be passed as an identifier via the sqlite3 API instead (or simply avoided, as #205 does). Thank you for the contribution!

@tirth8205 tirth8205 closed this Apr 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants