Skip to content

fix(writer): avoid compact overwriting concurrent writes#25

Open
rogerdigital wants to merge 2 commits intoEinsia:mainfrom
rogerdigital:fix-compact-cas
Open

fix(writer): avoid compact overwriting concurrent writes#25
rogerdigital wants to merge 2 commits intoEinsia:mainfrom
rogerdigital:fix-compact-cas

Conversation

@rogerdigital
Copy link
Copy Markdown

Summary

compact_file reads a memory file, sends it to the LLM, then writes the compacted result back. That LLM call can take long enough for a reducer/classifier append to land on the same file in the meantime. Before this change, the stale compacted output could overwrite those newer entries.

This PR adds a compare-and-swap style writeback:

  • read the original file before the LLM rewrite
  • parse and clear needs_compact in the compacted output before writing
  • take the existing per-file lock before writeback
  • re-read the file and skip compaction if it changed while the LLM was running
  • rebuild the file's FTS rows while still holding the same file lock after an accepted compact

If the file changed, compaction is left for a later retry instead of risking data loss.

Tests

  • uv run pytest tests/test_store.py -q
  • uv run ruff check src/openchronicle/writer/compact.py tests/test_store.py
  • uv run pytest -q

@rogerdigital
Copy link
Copy Markdown
Author

This fixes the compact follow-up called out in #12: compact_file now skips stale writeback if the memory file changed while the LLM rewrite was running.

Verified:

  • uv run pytest tests/test_store.py -q
  • uv run ruff check src/openchronicle/writer/compact.py tests/test_store.py
  • uv run pytest -q

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the file compaction process by introducing a file locking mechanism and a stale-read check to prevent data loss from concurrent writes during LLM processing. It also integrates the frontmatter library to manage the needs_compact flag within the file content. The review feedback suggests optimizing the database update logic by wrapping SQLite operations in a transaction and reducing redundant disk I/O by utilizing the in-memory file representation instead of re-reading the file from disk after writing.

Comment thread src/openchronicle/writer/compact.py Outdated
Comment on lines +131 to +158
fts.delete_entries_for(conn, path.name)
parsed = files_mod.read_file(path)
prefix = files_mod.validate_prefix(path.name)
fts.upsert_file(
conn,
id=e.id,
path=path.name,
prefix=prefix,
timestamp=e.timestamp,
tags=" ".join(e.tags),
content=entries_mod._strip_strike(e.body),
superseded=1 if e.superseded_by else 0,
fts.FileRow(
path=path.name,
prefix=prefix,
description=parsed.description,
tags=" ".join(parsed.tags),
status=parsed.status,
entry_count=len(parsed.entries),
created=parsed.created,
updated=parsed.updated,
needs_compact=0,
),
)
# Clear frontmatter flag
files_mod.update_frontmatter(path, {"needs_compact": False})
for e in parsed.entries:
fts.insert_entry(
conn,
id=e.id,
path=path.name,
prefix=prefix,
timestamp=e.timestamp,
tags=" ".join(e.tags),
content=entries_mod._strip_strike(e.body),
superseded=1 if e.superseded_by else 0,
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The FTS update logic involves multiple SQLite operations. Wrapping these operations in a single transaction will significantly improve performance and reduce the time the file lock is held. However, ensure that potentially slow or blocking operations, such as file I/O and parsing, are kept outside the transaction block to minimize lock contention and prevent stalling other threads.

        parsed = files_mod.read_file(path)
        prefix = files_mod.validate_prefix(path.name)
        with conn:
            fts.delete_entries_for(conn, path.name)
            fts.upsert_file(
                conn,
                fts.FileRow(
                    path=path.name,
                    prefix=prefix,
                    description=parsed.description,
                    tags=" ".join(parsed.tags),
                    status=parsed.status,
                    entry_count=len(parsed.entries),
                    created=parsed.created,
                    updated=parsed.updated,
                    needs_compact=0,
                ),
            )
            for e in parsed.entries:
                fts.insert_entry(
                    conn,
                    id=e.id,
                    path=path.name,
                    prefix=prefix,
                    timestamp=e.timestamp,
                    tags=" ".join(e.tags),
                    content=entries_mod._strip_strike(e.body),
                    superseded=1 if e.superseded_by else 0,
                )
References
  1. When using locks to protect shared state from race conditions, keep potentially slow or blocking operations (e.g., callbacks) outside the locked critical section to prevent stalling other threads.

Comment thread src/openchronicle/writer/compact.py Outdated
# Re-ingest this file's entries into FTS while still holding the same
# file lock so on-disk Markdown and index rows move forward together.
fts.delete_entries_for(conn, path.name)
parsed = files_mod.read_file(path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling files_mod.read_file(path) here is redundant because it performs a disk read and parses the file we just wrote. Since the content is already available in memory (e.g., the compacted object), you should operate on that representation instead of re-reading from disk to avoid unnecessary I/O.

References
  1. To avoid redundant I/O, when modifying file content that has already been read into memory, operate on the in-memory representation (e.g., using frontmatter.loads(text)) instead of re-reading from disk (frontmatter.load(path)).

@rogerdigital
Copy link
Copy Markdown
Author

Addressed the Gemini review in 1e6c861:

  • stopped re-reading/parsing the compacted file from disk after writeback; FTS metadata now uses the in-memory frontmatter representation
  • wrapped the FTS delete/upsert/insert batch in a SQLite savepoint so the index refresh is atomic

Re-verified:

  • uv run pytest tests/test_store.py -q
  • uv run ruff check src/openchronicle/writer/compact.py tests/test_store.py
  • uv run pytest -q

@rogerdigital
Copy link
Copy Markdown
Author

Both Gemini comments addressed in 1cd66d6:

  1. FTS batch wrapped in savepointSAVEPOINT compact_file_fts covers all three FTS operations (delete → upsert file → insert entries) so they commit or rollback atomically, and the lock is released immediately after.

  2. Redundant disk read after writeback — fixed. The FTS update now reads directly from the in-memory compacted object (compacted.metadata, compacted_entries) rather than re-reading from disk after atomic_write_text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant