fix(writer): avoid compact overwriting concurrent writes by rogerdigital · Pull Request #25 · Einsia/OpenChronicle

rogerdigital · 2026-04-29T06:02:06Z

Summary

compact_file reads a memory file, sends it to the LLM, then writes the compacted result back. That LLM call can take long enough for a reducer/classifier append to land on the same file in the meantime. Before this change, the stale compacted output could overwrite those newer entries.

This PR adds a compare-and-swap style writeback:

read the original file before the LLM rewrite
parse and clear needs_compact in the compacted output before writing
take the existing per-file lock before writeback
re-read the file and skip compaction if it changed while the LLM was running
rebuild the file's FTS rows while still holding the same file lock after an accepted compact

If the file changed, compaction is left for a later retry instead of risking data loss.

Tests

uv run pytest tests/test_store.py -q
uv run ruff check src/openchronicle/writer/compact.py tests/test_store.py
uv run pytest -q

rogerdigital · 2026-04-29T06:06:30Z

This fixes the compact follow-up called out in #12: compact_file now skips stale writeback if the memory file changed while the LLM rewrite was running.

Verified:

uv run pytest tests/test_store.py -q
uv run ruff check src/openchronicle/writer/compact.py tests/test_store.py
uv run pytest -q

gemini-code-assist

Code Review

This pull request enhances the file compaction process by introducing a file locking mechanism and a stale-read check to prevent data loss from concurrent writes during LLM processing. It also integrates the frontmatter library to manage the needs_compact flag within the file content. The review feedback suggests optimizing the database update logic by wrapping SQLite operations in a transaction and reducing redundant disk I/O by utilizing the in-memory file representation instead of re-reading the file from disk after writing.

gemini-code-assist · 2026-04-29T06:08:46Z

+        fts.delete_entries_for(conn, path.name)
+        parsed = files_mod.read_file(path)
+        prefix = files_mod.validate_prefix(path.name)
+        fts.upsert_file(
            conn,
-            id=e.id,
-            path=path.name,
-            prefix=prefix,
-            timestamp=e.timestamp,
-            tags=" ".join(e.tags),
-            content=entries_mod._strip_strike(e.body),
-            superseded=1 if e.superseded_by else 0,
+            fts.FileRow(
+                path=path.name,
+                prefix=prefix,
+                description=parsed.description,
+                tags=" ".join(parsed.tags),
+                status=parsed.status,
+                entry_count=len(parsed.entries),
+                created=parsed.created,
+                updated=parsed.updated,
+                needs_compact=0,
+            ),
        )
-    # Clear frontmatter flag
-    files_mod.update_frontmatter(path, {"needs_compact": False})
+        for e in parsed.entries:
+            fts.insert_entry(
+                conn,
+                id=e.id,
+                path=path.name,
+                prefix=prefix,
+                timestamp=e.timestamp,
+                tags=" ".join(e.tags),
+                content=entries_mod._strip_strike(e.body),
+                superseded=1 if e.superseded_by else 0,
+            )


The FTS update logic involves multiple SQLite operations. Wrapping these operations in a single transaction will significantly improve performance and reduce the time the file lock is held. However, ensure that potentially slow or blocking operations, such as file I/O and parsing, are kept outside the transaction block to minimize lock contention and prevent stalling other threads.

parsed = files_mod.read_file(path) prefix = files_mod.validate_prefix(path.name) with conn: fts.delete_entries_for(conn, path.name) fts.upsert_file( conn, fts.FileRow( path=path.name, prefix=prefix, description=parsed.description, tags=" ".join(parsed.tags), status=parsed.status, entry_count=len(parsed.entries), created=parsed.created, updated=parsed.updated, needs_compact=0, ), ) for e in parsed.entries: fts.insert_entry( conn, id=e.id, path=path.name, prefix=prefix, timestamp=e.timestamp, tags=" ".join(e.tags), content=entries_mod._strip_strike(e.body), superseded=1 if e.superseded_by else 0, )

References

When using locks to protect shared state from race conditions, keep potentially slow or blocking operations (e.g., callbacks) outside the locked critical section to prevent stalling other threads.

gemini-code-assist · 2026-04-29T06:08:46Z

+        # Re-ingest this file's entries into FTS while still holding the same
+        # file lock so on-disk Markdown and index rows move forward together.
+        fts.delete_entries_for(conn, path.name)
+        parsed = files_mod.read_file(path)


Calling files_mod.read_file(path) here is redundant because it performs a disk read and parses the file we just wrote. Since the content is already available in memory (e.g., the compacted object), you should operate on that representation instead of re-reading from disk to avoid unnecessary I/O.

References

To avoid redundant I/O, when modifying file content that has already been read into memory, operate on the in-memory representation (e.g., using frontmatter.loads(text)) instead of re-reading from disk (frontmatter.load(path)).

rogerdigital · 2026-04-29T06:32:31Z

Addressed the Gemini review in 1e6c861:

stopped re-reading/parsing the compacted file from disk after writeback; FTS metadata now uses the in-memory frontmatter representation
wrapped the FTS delete/upsert/insert batch in a SQLite savepoint so the index refresh is atomic

Re-verified:

uv run pytest tests/test_store.py -q
uv run ruff check src/openchronicle/writer/compact.py tests/test_store.py
uv run pytest -q

rogerdigital · 2026-04-30T08:45:47Z

Both Gemini comments addressed in 1cd66d6:

FTS batch wrapped in savepoint — SAVEPOINT compact_file_fts covers all three FTS operations (delete → upsert file → insert entries) so they commit or rollback atomically, and the lock is released immediately after.
Redundant disk read after writeback — fixed. The FTS update now reads directly from the in-memory compacted object (compacted.metadata, compacted_entries) rather than re-reading from disk after atomic_write_text.

gemini-code-assist Bot reviewed Apr 29, 2026

View reviewed changes

rogerdigital added 2 commits April 29, 2026 15:49

fix(writer): avoid compact overwriting concurrent writes

aaf9d87

fix(writer): address compact index review

1cd66d6

rogerdigital force-pushed the fix-compact-cas branch from 1e6c861 to 1cd66d6 Compare April 29, 2026 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(writer): avoid compact overwriting concurrent writes#25

fix(writer): avoid compact overwriting concurrent writes#25
rogerdigital wants to merge 2 commits intoEinsia:mainfrom
rogerdigital:fix-compact-cas

rogerdigital commented Apr 29, 2026

Uh oh!

rogerdigital commented Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Uh oh!

rogerdigital commented Apr 29, 2026

Uh oh!

rogerdigital commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rogerdigital commented Apr 29, 2026

Summary

Tests

Uh oh!

rogerdigital commented Apr 29, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

rogerdigital commented Apr 29, 2026

Uh oh!

rogerdigital commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant