Skip to content

Phase 2 ARCHITECTURE_UPDATE loses all results when context compacts before final knowledge-graph.json write #433

@rilkerfc

Description

@rilkerfc

Summary

When auto-update triggers an ARCHITECTURE_UPDATE with many structural files (e.g. 98 files → 13 batches), the current Phase 2 flow accumulates all batch results in the main LLM context before performing a single final knowledge-graph.json write. If context compaction fires between the last batch completing and the write step, 100% of work is lostknowledge-graph.json and meta.json remain unchanged, and the next session re-triggers the full update.

Reproduction

  1. Let KG drift across many sprints (e.g. 9 sprints / 516 commits).
  2. Session start triggers ARCHITECTURE_UPDATE with ~98 structural files → 13 parallel batches.
  3. Batches complete (~1.2M tokens total in context).
  4. Context compaction fires before Write knowledge-graph.json executes.
  5. On next session start: hook fires again, same 13-batch cost, same outcome.

Root cause

Phase 2 step 4 reads:

4. After batch(es) complete, read each `batch-<N>.json` and merge results.

Then step 5 (merge) and Phase 3d (save) happen in the main thread. For large batch counts, the accumulated agent outputs fill the context window before the write executes.

The intermediate/batch-N.json files ARE written by agents (good), but the merge + final write is a single all-or-nothing operation in the main thread that is vulnerable to context compaction.

Proposed fix

Process batches in groups of 3-4. After each group: merge → write knowledge-graph.json immediately (checkpoint). Then continue with the next group.

4. Process batches in groups of **3-4 at a time**. After each group:
   a. Verify each `intermediate/batch-N.json` EXISTS on disk (agent must have written it)
      — if missing, the agent failed silently; log a warning and skip that batch
   b. Read the batch file(s) and merge their nodes/edges into the running KG
   c. Write the updated `knowledge-graph.json` immediately (checkpoint)
   Then continue with the next group of batches.
   **Rationale:** context compaction mid-merge loses all in-memory results.
   A per-group checkpoint write ensures partial progress survives.

This means context compaction only loses the tail (unprocessed batches), not everything already merged. On the next session start, the fingerprint check correctly identifies only the remaining unprocessed files as changed.

Additional note: P2 anchor drift

In v2.7.5, the LOAD-PATCH-SAVE section of auto-update-prompt.md was restructured. The apply_ua_patches.py script we use locally to patch the prompt has a stale find anchor for P2 (// fingerprint-update.mjs\n). The marker check still passes (patch is present), but the anchor string no longer matches. Worth noting for users who maintain local patch scripts.

Environment

  • Plugin version: 2.7.5
  • Project: large Python/FastAPI monorepo (~1700 KG nodes, 50+ services)
  • Structural change count that triggered this: 98 files (9-sprint drift)
  • Estimated tokens lost: ~1.2M

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions