Skip to content

feat: safe writes — auto-git-commit + COW atomicity for batch operations #72

@tenfourty

Description

@tenfourty

Summary

Two complementary safety mechanisms for kbx writes:

  1. Auto-git-commit: When kbx writes to memory files, optionally auto-commit the changes to git with a generated commit message — giving a full auditable history and git revert as the rollback mechanism.
  2. COW atomicity: For batch operations touching multiple files, write to temp files first then atomic rename. Complete success or complete rollback — no partial writes.
# kbx.toml
[writes]
auto_commit = true
auto_commit_message_format = "kbx: {operation} {target}"

Motivation

kbx writes to memory files frequently — adding facts, updating entity roles, creating notes, writing open items, running corrections across 50+ files. Currently these writes are fire-and-forget:

  • No audit trail: Who changed what, when? git log only helps if someone remembers to commit. Automated pipelines (debriefs, sync) write silently.
  • No rollback: A bad kbx correct --apply across 78 files can't be undone without manually restoring from backups or git checkout (if you committed beforehand).
  • Partial failures: If kbx correct --apply fails on file 40 of 78, the first 39 are already modified. The operation is half-done, half-not — the worst state.

With auto-commit + COW:

  • Every kbx write produces a git commit → git log --author=kbx shows full history
  • git revert <hash> cleanly undoes any change
  • Batch operations are atomic — all files change together or none do
  • Combined: COW completes successfully → single git commit with all changes

Inspiration: OpenViking uses a COW commit pattern for session mutations — copies directory trees to temp, makes all changes there, then atomic swap. Their commit_async() guarantees that live data is never corrupted by partial writes.

Design

Part 1 — Auto-Git-Commit on Write

1.1 Configuration

# kbx.toml
[writes]
# Enable auto-commit (default: false)
auto_commit = false

# Commit message format (supports placeholders)
auto_commit_message_format = "kbx: {operation} {target}"

# Author for auto-commits
auto_commit_author = "kbx <kbx@localhost>"

Placeholders for auto_commit_message_format:

Placeholder Example
{operation} add-fact, edit-role, create-note, correct, add-open-item
{target} Entity name, note title, or file path
{command} Full kbx command (e.g. memory add "title" --entity "Name")
{file_count} Number of files changed (for batch ops)
{timestamp} ISO 8601 timestamp

Default format examples:

  • kbx: add-fact Person A
  • kbx: edit-role Person A
  • kbx: create-note "Active Strategic Initiatives"
  • kbx: correct "OldTerm" → "NewTerm" (47 files)
  • kbx: add-open-item Person B (from: Team Standup)

1.2 Git Detection

Auto-commit only activates if:

  1. auto_commit = true in config
  2. The memory directory is inside a git repo (git -C <memory_dir> rev-parse --git-dir succeeds)
  3. The --no-commit flag is not set
def _should_auto_commit(self) -> bool:
    if not self.config.auto_commit:
        return False
    if self._no_commit_flag:
        return False
    try:
        subprocess.run(
            ["git", "-C", self.memory_dir, "rev-parse", "--git-dir"],
            capture_output=True, check=True,
        )
        return True
    except (subprocess.CalledProcessError, FileNotFoundError):
        return False

Cache the git detection result for the session — don't re-check on every write.

1.3 Operations That Trigger Commits

Commits on write (memory file mutations):

Command Commit message
kbx memory add "title" kbx: create-note "title"
kbx memory add "fact" --entity "Name" kbx: add-fact Name
kbx memory edit-fact <id> --text "..." kbx: edit-fact Name
kbx memory delete-fact <id> kbx: delete-fact Name
kbx person create "Name" kbx: create-person Name
kbx person edit "Name" --role "..." kbx: edit-role Name
kbx project create "Name" kbx: create-project Name
kbx project edit "Name" --meta "..." kbx: edit-project Name
kbx note edit "title" --body "..." kbx: edit-note "title"
kbx note delete "title" kbx: delete-note "title"
kbx correct "old" "new" --apply kbx: correct "old" → "new" (N files)
kbx glossary add "TERM" "expansion" kbx: add-glossary TERM
kbx glossary edit "TERM" "expansion" kbx: edit-glossary TERM
kbx glossary delete "TERM" kbx: delete-glossary TERM

No commit (read-only or DB-only):

Command Why no commit
kbx search, kbx view, kbx list Read-only
kbx context, kbx entity stale Read-only
kbx index run Writes to SQLite/LanceDB, not memory files
kbx sync Sync writes memory files — but these should be committed as a batch: kbx: sync granola (N meetings)
kbx pin, kbx unpin DB-only (pin state in SQLite)

Special case — kbx sync: Sync can create/update many meeting files. This should produce a single commit: kbx: sync granola --since 2026-03-01 (12 meetings).

1.4 Batch Operations → Single Commit

Operations that touch multiple files must produce one commit, not one per file:

class WriteTransaction:
    """Collects file changes, commits once at the end."""

    def __init__(self, memory_dir: str, message_format: str):
        self.memory_dir = memory_dir
        self.changed_files: list[str] = []
        self.operation = ""
        self.target = ""

    def record_change(self, path: str):
        self.changed_files.append(path)

    def commit(self):
        if not self.changed_files:
            return
        # Stage only the changed files (not git add -A)
        subprocess.run(
            ["git", "-C", self.memory_dir, "add", "--"] + self.changed_files,
            check=True,
        )
        message = self._format_message()
        subprocess.run(
            ["git", "-C", self.memory_dir, "commit", "-m", message,
             "--author", self.author],
            check=True,
        )

Important: Use git add <specific files>, never git add -A. Only commit files that kbx actually changed.

1.5 Error Handling

Git commit failures should warn but not block the kbx operation:

try:
    transaction.commit()
except subprocess.CalledProcessError as e:
    logger.warning(f"Auto-commit failed: {e}. Changes were written but not committed.")
    # Don't raise — the kbx operation itself succeeded

Scenarios:

  • Git not installed → auto-commit silently disabled (logged at debug level)
  • Dirty working tree conflicts → warn, changes are written but uncommitted
  • Lock file contention → warn, retry once, then skip
  • Pre-commit hook failure → warn with hook output, changes are written but uncommitted

1.6 --no-commit Flag

Skip auto-commit for a single operation:

# Write without committing (even if auto_commit = true)
kbx memory add "quick scratch note" --no-commit

# Useful for: scripted batch operations where you want manual commit control
for name in names:
    kbx memory add "fact" --entity "$name" --no-commit
git commit -am "batch: add facts for all team members"

Part 2 — COW Atomicity for Batch Operations

2.1 Single-File Writes

For operations that modify one file, use temp file → atomic rename:

import tempfile
import os

def atomic_write(target_path: str, content: str):
    """Write to temp file, then atomic rename."""
    dir_name = os.path.dirname(target_path)
    fd, tmp_path = tempfile.mkstemp(dir=dir_name, suffix=".tmp")
    try:
        with os.fdopen(fd, "w") as f:
            f.write(content)
        os.replace(tmp_path, target_path)  # atomic on POSIX
    except Exception:
        os.unlink(tmp_path)  # clean up on failure
        raise

os.replace() is atomic on POSIX filesystems — the target file is either the old content or the new content, never a partial write.

2.2 Batch Operations (Multi-File)

For operations touching multiple files (kbx correct --apply, bulk entity updates, sync), use a staged write buffer with commit-or-rollback semantics:

class BatchWriter:
    """
    Collects writes in temp files. On commit(), atomically replaces all targets.
    On rollback() or exception, cleans up temp files — originals untouched.
    """

    def __init__(self, memory_dir: str):
        self.memory_dir = memory_dir
        self.pending: list[tuple[str, str]] = []  # (target_path, tmp_path)

    def write(self, target_path: str, content: str):
        """Stage a write — content goes to temp file, not target."""
        dir_name = os.path.dirname(target_path)
        fd, tmp_path = tempfile.mkstemp(dir=dir_name, suffix=".kbx-cow")
        with os.fdopen(fd, "w") as f:
            f.write(content)
        self.pending.append((target_path, tmp_path))

    def commit(self):
        """Atomically replace all targets with their temp files."""
        # Phase 1: Verify all temp files exist and are valid
        for target, tmp in self.pending:
            if not os.path.exists(tmp):
                raise RuntimeError(f"Temp file missing: {tmp}")

        # Phase 2: Atomic replace (each os.replace is atomic individually)
        committed = []
        try:
            for target, tmp in self.pending:
                os.replace(tmp, target)
                committed.append(target)
        except Exception:
            # Partial commit — log which files were already replaced
            logger.error(
                f"Batch commit failed after {len(committed)}/{len(self.pending)} files. "
                f"Committed: {committed}"
            )
            self._cleanup_remaining()
            raise

        self.pending.clear()

    def rollback(self):
        """Discard all staged writes — originals untouched."""
        self._cleanup_remaining()
        self.pending.clear()

    def _cleanup_remaining(self):
        for _, tmp in self.pending:
            try:
                os.unlink(tmp)
            except FileNotFoundError:
                pass

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        if exc_type is not None:
            self.rollback()
        # If commit() wasn't called, rollback on exit
        if self.pending:
            self.rollback()

Usage in kbx correct --apply:

with BatchWriter(memory_dir) as batch:
    for file_path, new_content in corrections:
        batch.write(file_path, new_content)
    # All writes staged as temp files — originals untouched

    batch.commit()
    # All files atomically replaced

# If any exception occurs before commit(), rollback() runs automatically

2.3 Combined: COW + Auto-Commit

The two mechanisms compose naturally:

with BatchWriter(memory_dir) as batch:
    for file_path, new_content in corrections:
        batch.write(file_path, new_content)

    batch.commit()  # Step 1: atomic file replacement

    if should_auto_commit:
        git_commit(
            files=[target for target, _ in batch.committed],
            message=f'kbx: correct "{old}" → "{new}" ({len(batch.committed)} files)',
        )  # Step 2: single git commit

Failure modes:

  • COW commit fails → all files unchanged, no git commit
  • COW commit succeeds, git commit fails → files are updated (correct state), but not committed (warn user)
  • Both succeed → files updated + git commit with full changeset

2.4 Existing Operations to Migrate

Operation Current behaviour With COW
kbx correct --apply Direct file writes, one at a time BatchWriter, atomic
kbx sync Direct writes as meetings are processed BatchWriter per sync batch
kbx person edit Single file write atomic_write (single file)
kbx memory add --entity Appends to entity file atomic_write (single file)
kbx note edit Single file write atomic_write (single file)
kbx glossary add/edit Single file write atomic_write (single file)

Integration with Other Features

  • kbx memory similar (feat: kbx memory similar — semantic similarity lookup for dedup #71): Similarity check happens before the write enters the COW buffer. The flow is: check similar → decide to write → stage in COW → commit → auto-commit.
  • Entity relations (feat: explicit typed entity relations for graph-style queries #70): Relation writes go to SQLite (not memory files), so they're already atomic via SQLite transactions. No COW needed, but auto-commit could optionally record relation changes if they modify entity files.
  • Debrief pipeline: Automated debriefs that write open items to entity files should use BatchWriter to stage all entity updates, then commit once.

Implementation Phases

  1. Phase 1 — atomic_write for single files: Replace direct file writes with temp → rename pattern across all kbx write paths. ~1 day
  2. Phase 2 — BatchWriter for multi-file ops: Implement BatchWriter, migrate kbx correct --apply. ~1-2 days
  3. Phase 3 — Auto-git-commit: Config, git detection, WriteTransaction, commit message formatting. Wire into all write operations. ~2 days
  4. Phase 4 — CLI surface: --no-commit flag, auto_commit_message_format config, sync batch commits. ~1 day

Open Questions

  • Should auto-commit be on by default once implemented, or off by default with explicit opt-in? Off-by-default is safer for existing users.
  • Should the git author be configurable, or always use a fixed identity like kbx <kbx@localhost>? Fixed identity makes git log --author=kbx filtering trivial.
  • Should there be a kbx writes log command that shows recent auto-committed changes (wrapper around git log --author=kbx)?
  • For the COW partial-commit failure case (file 40 of 78 fails on os.replace), should the system attempt to restore the already-replaced files from git? This adds complexity but improves the atomicity guarantee.
  • Should kbx sync auto-commit be a separate config option? Sync can produce large commits (12+ meeting files) that may be better handled differently from single-fact writes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions