Skip to content

Ledger finalizer re-runs terminally-failed sessions, causing duplicate-finalize loop and unresolvable rebase conflicts #601

@galexy

Description

@galexy

Symptom

On a long-running machine the ledger (.sageox/ledger) drifted to 78 ahead / 448 behind origin/main, with the daemon stuck in a loop:

Last error  pull failed: Auto-merging sessions/2026-02-13T14-56-ajit-OxmoZK/meta.json
CONFLICT (content): Merge conflict in sessions/2026-02-13T14-56-ajit-OxmoZK/meta.json
... (12 more session files) ...
error: could not apply e675155c... ox doctor: auto-commit ledger changes

ox doctor --fix couldn't repair it — its own auto-commits were part of the loop.

Root cause

Grouping the 78 local-only commits by message reveals the runaway pattern:

15× finalize session 2026-04-09T16-51-ajit-OxAnpf
14× finalize session 2026-02-13T14-56-ajit-OxmoZK
13× finalize session 2026-03-25T15-56-ryan-OxTfdN
13× ox doctor: auto-commit ledger changes
 3× finalize session 2026-03-03T09-57-ryan-Ox3E53
 …

All four repeatedly-finalized sessions share the same terminal failure mode — summarization fails the content validator:

"validation_error": "content validation failed: title too short (0 chars, minimum 3)"

For each of those sessions:

Field Local (looping) Upstream (terminal)
summary_status failed_validation unrecoverable
summary_attempts 1 3

Upstream has already advanced these sessions through retries to unrecoverable, attempts=3. The local finalizer doesn't know that — it re-enters from attempts=1, failed_validation and writes a fresh commit. The next git pull --rebase tries to replay that commit on top of upstream's terminal state, the same meta.json keys conflict, the rebase aborts, ox doctor reset-and-re-commits, and the cycle repeats.

Each pass through the loop produced roughly: finalize session X × N + ox doctor: auto-commit ledger changes × 1. 78 commits over a single afternoon.

Why it didn't self-heal

  1. ox doctor --fix reacts to a failed pull by auto-committing local state — that produces another commit that will conflict on the next pull.
  2. The finalizer treats summary_status: failed_validation as "try again," not as a terminal state synchronized from the remote.
  3. There's no "upstream says this session is unrecoverable — stop touching it" guard.

Workaround used today

cd .sageox/ledger
git reset --hard origin/main

The 78 local commits were all duplicates of work the remote had already done (and superseded), so nothing real was lost. But this requires manual intervention and destructive git, which most users won't be comfortable with.

Suggested fix

In the session finalizer / ox doctor auto-fix path:

  1. Before finalizing a session, check the upstream meta.json for that session (fetch is cheap). If upstream has summary_status: unrecoverable (or any terminal state) with summary_attempts >= max_attempts, skip — do not re-finalize and do not produce a new commit.
  2. More generally, treat the per-session meta.json as an append-only state machine: failed_validation (attempts<max) → unrecoverable (attempts==max) is forward-only. The finalizer should not downgrade unrecoverable back to failed_validation, attempts=1.
  3. ox doctor --fix should detect "rebase is failing on the same files repeatedly" and either (a) git reset --hard origin/main automatically if the only local-only commits are duplicate finalize attempts whose content is strictly older than upstream, or (b) flag the situation as needing human intervention rather than producing yet another auto-commit.

Repro

Hard to reproduce deterministically, but the seed is:

  • A session whose entry_count > 0 but whose generated summary fails the title-length validator (title too short (0 chars, minimum 3)). The four sessions named above all trip this.
  • Run ox doctor repeatedly while the daemon is also active. Each pass produces another duplicate finalize commit until the rebase explodes.

Related

  • .claude/rules/daemon-git.md — "Daemon reads (pull), CLI writes (add/commit/push). Never discard uncommitted changes."
  • Session content-validation rule that emits title too short (0 chars, minimum 3) for these four sessions is itself worth a look — entry_count=36 should not produce a 0-char title; the upstream summarizer eventually gave up at attempts=3 rather than ever succeeding.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions