fix(writer): avoid compact overwriting concurrent writes#25
fix(writer): avoid compact overwriting concurrent writes#25rogerdigital wants to merge 2 commits intoEinsia:mainfrom
Conversation
|
This fixes the compact follow-up called out in #12: Verified:
|
There was a problem hiding this comment.
Code Review
This pull request enhances the file compaction process by introducing a file locking mechanism and a stale-read check to prevent data loss from concurrent writes during LLM processing. It also integrates the frontmatter library to manage the needs_compact flag within the file content. The review feedback suggests optimizing the database update logic by wrapping SQLite operations in a transaction and reducing redundant disk I/O by utilizing the in-memory file representation instead of re-reading the file from disk after writing.
| fts.delete_entries_for(conn, path.name) | ||
| parsed = files_mod.read_file(path) | ||
| prefix = files_mod.validate_prefix(path.name) | ||
| fts.upsert_file( | ||
| conn, | ||
| id=e.id, | ||
| path=path.name, | ||
| prefix=prefix, | ||
| timestamp=e.timestamp, | ||
| tags=" ".join(e.tags), | ||
| content=entries_mod._strip_strike(e.body), | ||
| superseded=1 if e.superseded_by else 0, | ||
| fts.FileRow( | ||
| path=path.name, | ||
| prefix=prefix, | ||
| description=parsed.description, | ||
| tags=" ".join(parsed.tags), | ||
| status=parsed.status, | ||
| entry_count=len(parsed.entries), | ||
| created=parsed.created, | ||
| updated=parsed.updated, | ||
| needs_compact=0, | ||
| ), | ||
| ) | ||
| # Clear frontmatter flag | ||
| files_mod.update_frontmatter(path, {"needs_compact": False}) | ||
| for e in parsed.entries: | ||
| fts.insert_entry( | ||
| conn, | ||
| id=e.id, | ||
| path=path.name, | ||
| prefix=prefix, | ||
| timestamp=e.timestamp, | ||
| tags=" ".join(e.tags), | ||
| content=entries_mod._strip_strike(e.body), | ||
| superseded=1 if e.superseded_by else 0, | ||
| ) |
There was a problem hiding this comment.
The FTS update logic involves multiple SQLite operations. Wrapping these operations in a single transaction will significantly improve performance and reduce the time the file lock is held. However, ensure that potentially slow or blocking operations, such as file I/O and parsing, are kept outside the transaction block to minimize lock contention and prevent stalling other threads.
parsed = files_mod.read_file(path)
prefix = files_mod.validate_prefix(path.name)
with conn:
fts.delete_entries_for(conn, path.name)
fts.upsert_file(
conn,
fts.FileRow(
path=path.name,
prefix=prefix,
description=parsed.description,
tags=" ".join(parsed.tags),
status=parsed.status,
entry_count=len(parsed.entries),
created=parsed.created,
updated=parsed.updated,
needs_compact=0,
),
)
for e in parsed.entries:
fts.insert_entry(
conn,
id=e.id,
path=path.name,
prefix=prefix,
timestamp=e.timestamp,
tags=" ".join(e.tags),
content=entries_mod._strip_strike(e.body),
superseded=1 if e.superseded_by else 0,
)References
- When using locks to protect shared state from race conditions, keep potentially slow or blocking operations (e.g., callbacks) outside the locked critical section to prevent stalling other threads.
| # Re-ingest this file's entries into FTS while still holding the same | ||
| # file lock so on-disk Markdown and index rows move forward together. | ||
| fts.delete_entries_for(conn, path.name) | ||
| parsed = files_mod.read_file(path) |
There was a problem hiding this comment.
Calling files_mod.read_file(path) here is redundant because it performs a disk read and parses the file we just wrote. Since the content is already available in memory (e.g., the compacted object), you should operate on that representation instead of re-reading from disk to avoid unnecessary I/O.
References
- To avoid redundant I/O, when modifying file content that has already been read into memory, operate on the in-memory representation (e.g., using frontmatter.loads(text)) instead of re-reading from disk (frontmatter.load(path)).
|
Addressed the Gemini review in
Re-verified:
|
1e6c861 to
1cd66d6
Compare
|
Both Gemini comments addressed in
|
Summary
compact_filereads a memory file, sends it to the LLM, then writes the compacted result back. That LLM call can take long enough for a reducer/classifier append to land on the same file in the meantime. Before this change, the stale compacted output could overwrite those newer entries.This PR adds a compare-and-swap style writeback:
needs_compactin the compacted output before writingIf the file changed, compaction is left for a later retry instead of risking data loss.
Tests
uv run pytest tests/test_store.py -quv run ruff check src/openchronicle/writer/compact.py tests/test_store.pyuv run pytest -q