Summary
When auto-update triggers an ARCHITECTURE_UPDATE with many structural files (e.g. 98 files → 13 batches), the current Phase 2 flow accumulates all batch results in the main LLM context before performing a single final knowledge-graph.json write. If context compaction fires between the last batch completing and the write step, 100% of work is lost — knowledge-graph.json and meta.json remain unchanged, and the next session re-triggers the full update.
Reproduction
- Let KG drift across many sprints (e.g. 9 sprints / 516 commits).
- Session start triggers
ARCHITECTURE_UPDATE with ~98 structural files → 13 parallel batches.
- Batches complete (~1.2M tokens total in context).
- Context compaction fires before
Write knowledge-graph.json executes.
- On next session start: hook fires again, same 13-batch cost, same outcome.
Root cause
Phase 2 step 4 reads:
4. After batch(es) complete, read each `batch-<N>.json` and merge results.
Then step 5 (merge) and Phase 3d (save) happen in the main thread. For large batch counts, the accumulated agent outputs fill the context window before the write executes.
The intermediate/batch-N.json files ARE written by agents (good), but the merge + final write is a single all-or-nothing operation in the main thread that is vulnerable to context compaction.
Proposed fix
Process batches in groups of 3-4. After each group: merge → write knowledge-graph.json immediately (checkpoint). Then continue with the next group.
4. Process batches in groups of **3-4 at a time**. After each group:
a. Verify each `intermediate/batch-N.json` EXISTS on disk (agent must have written it)
— if missing, the agent failed silently; log a warning and skip that batch
b. Read the batch file(s) and merge their nodes/edges into the running KG
c. Write the updated `knowledge-graph.json` immediately (checkpoint)
Then continue with the next group of batches.
**Rationale:** context compaction mid-merge loses all in-memory results.
A per-group checkpoint write ensures partial progress survives.
This means context compaction only loses the tail (unprocessed batches), not everything already merged. On the next session start, the fingerprint check correctly identifies only the remaining unprocessed files as changed.
Additional note: P2 anchor drift
In v2.7.5, the LOAD-PATCH-SAVE section of auto-update-prompt.md was restructured. The apply_ua_patches.py script we use locally to patch the prompt has a stale find anchor for P2 (// fingerprint-update.mjs\n). The marker check still passes (patch is present), but the anchor string no longer matches. Worth noting for users who maintain local patch scripts.
Environment
- Plugin version: 2.7.5
- Project: large Python/FastAPI monorepo (~1700 KG nodes, 50+ services)
- Structural change count that triggered this: 98 files (9-sprint drift)
- Estimated tokens lost: ~1.2M
Summary
When
auto-updatetriggers anARCHITECTURE_UPDATEwith many structural files (e.g. 98 files → 13 batches), the current Phase 2 flow accumulates all batch results in the main LLM context before performing a single finalknowledge-graph.jsonwrite. If context compaction fires between the last batch completing and the write step, 100% of work is lost —knowledge-graph.jsonandmeta.jsonremain unchanged, and the next session re-triggers the full update.Reproduction
ARCHITECTURE_UPDATEwith ~98 structural files → 13 parallel batches.Write knowledge-graph.jsonexecutes.Root cause
Phase 2 step 4 reads:
Then step 5 (merge) and Phase 3d (save) happen in the main thread. For large batch counts, the accumulated agent outputs fill the context window before the write executes.
The
intermediate/batch-N.jsonfiles ARE written by agents (good), but the merge + final write is a single all-or-nothing operation in the main thread that is vulnerable to context compaction.Proposed fix
Process batches in groups of 3-4. After each group: merge → write
knowledge-graph.jsonimmediately (checkpoint). Then continue with the next group.This means context compaction only loses the tail (unprocessed batches), not everything already merged. On the next session start, the fingerprint check correctly identifies only the remaining unprocessed files as changed.
Additional note: P2 anchor drift
In v2.7.5, the
LOAD-PATCH-SAVEsection ofauto-update-prompt.mdwas restructured. Theapply_ua_patches.pyscript we use locally to patch the prompt has a stalefindanchor for P2 (// fingerprint-update.mjs\n). The marker check still passes (patch is present), but the anchor string no longer matches. Worth noting for users who maintain local patch scripts.Environment