fix: compare-and-swap memory file writes against compaction races#1436
Merged
Conversation
compact_session reads MEMORY.md / USER.md / SOUL.md, runs an LLM call that takes tens of seconds, then writes full rewrites. Compaction runs as a background task, so the conversation keeps going during the call: the agent's workspace tools can write a new fact in that window, and a second compaction for the same user can land first (the code already acknowledges concurrent compact_session tasks). The blind overwrite then clobbers the newer value, silently, including explicit user saves. HISTORY.md was already protected (advisory lock + FOR UPDATE in append_history); the other three files were last-writer-wins. write_memory_async / write_user_async / write_soul_async now accept an optional expected_current and perform the compare inside the write transaction under FOR UPDATE (plus the per-user advisory lock for the no-row-yet MEMORY.md branch, mirroring append_history). On drift the write is skipped and False is returned; compaction logs the skip and records memory_updated=False on the audit row. Losing one batch's extraction is recoverable (the conversation sent to the LLM is kept in the event row's prompt_text audit column); clobbering a durable file is not. Fixes #1429 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Caution Review failedPull request was closed or merged during review WalkthroughThis PR implements compare-and-swap (CAS) semantics to prevent race-condition data loss in compaction. When ChangesCompare-and-swap protection for compaction writes to memory documents
🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes #1429.
compact_sessionreads MEMORY.md / USER.md / SOUL.md, runs an LLM call that takes tens of seconds, then writes full rewrites. Compaction runs as a fire-and-forget background task, so the conversation keeps going during the call: the agent's workspace tools can write a new fact in that window, and a second compaction for the same user can land first (the snapshot logic already acknowledges concurrentcompact_sessiontasks). The blind overwrite then clobbers the newer value silently, including explicit user saves. HISTORY.md was already protected (advisory lock +FOR UPDATEinappend_history); MEMORY/USER/SOUL were last-writer-wins.Changes:
write_memory_async/write_user_async/write_soul_asyncaccept an optional keyword-onlyexpected_currentand returnbool. When provided, the compare runs inside the write transaction underFOR UPDATE(plus the per-user advisory lock for the no-row-yet MEMORY.md branch, mirroringappend_history). On drift the write is skipped andFalseis returned. Comparison uses the same normalized form the correspondingread_*_asyncreturns, so callers pass exactly what they read.compact_sessionpasses its top-of-function reads asexpected_current, logs a warning on a CAS miss, and recordsmemory_updated=False(etc.) on the audit row so the before/after snapshots stay truthful.Tradeoff, made deliberately: on a CAS miss the batch's extraction is dropped rather than re-merged with a second LLM call. Losing one batch's extraction is recoverable (the conversation sent to the LLM is preserved in the event row's
prompt_textaudit column, migration 031); clobbering a durable file is not. A bounded re-merge retry would be a reasonable follow-up if CAS misses turn out to be common; the new warning log line makes that measurable.Plain calls without
expected_current(e.g. the user-facing memory editor inuser_memory.py) keep last-writer-wins semantics, which is correct for direct user edits.Type
Checklist
uv run pytest -v) (2892 passed, 2 skipped)ruff check backend/ && ruff format --check backend/)AI Usage
Overview
This PR fixes a critical race condition where compaction operations could silently overwrite concurrent writes to memory files (MEMORY.md, USER.md, SOUL.md) during long-running LLM operations. The solution implements compare-and-swap (CAS) semantics to detect and skip writes when the underlying file has changed since the initial read.
What Changed
Compaction Process (compaction.py)
compact_sessionto use compare-and-swap when persisting LLM-generated updatesMemory Storage Layer (memory_db.py)
write_memory_async,write_user_async,write_soul_async) to support optional compare-and-swap validationNoneWhat Was Added
Test Coverage
Benefits