Implement cog-loop truncation guard by seamus-brady · Pull Request #165 · seamus-brady/springdrift

seamus-brady · 2026-04-26T15:45:50Z

Summary

Implements the truncation guard plan from PR #164 (docs/roadmap/planned/cog-loop-truncation-guard.md). When an LLM call returns stop_reason=MaxTokens with no tool calls, the cog loop now retries once with a scope-down nudge; on the second hit it ships a deterministic admission instead of the truncated text.

Also moves the planning doc to docs/roadmap/implemented/ with a "What Shipped" preamble.

What changed

PendingThink.truncation_retried: Bool — mirrors the existing empty_retried flag. Tracked across the retry to stop infinite-loop on the same failure.
handle_think_complete — new branch detects MaxTokens-with-no-tool-calls. First hit: retry with a scope-down nudge ("decompose into multiple turns OR tighten scope"). Second hit: ship the deterministic admission.
output.build_truncation_admission — pure helper, no LLM, no I/O. Builds the operator-facing reply with [truncation_guard] prefix, model + output_tokens + limit, tools fired this cycle, and three actionable recovery suggestions.
Tests — 9 new in test/agent/cognitive/truncation_guard_test.gleam:
- 5 pure tests covering admission text shape (prefix, model, tokens, empty tool list, recovery suggestions)
- 4 end-to-end tests driving the cog loop with mock providers returning MaxTokens on a controlled schedule
Doc — docs/roadmap/planned/cog-loop-truncation-guard.md → docs/roadmap/implemented/cog-loop-truncation-guard.md with a "What Shipped" section above the original plan.

Test plan

gleam build clean, no warnings
gleam test — 2049 passed (9 new), no failures
Pure tests prove admission text contract (prefix, fields, tools, suggestions)
First-hit retry test: provider returns MaxTokens → clean response on retry. Operator sees "recovered with tighter scope", NOT the truncated text and NOT the admission.
Second-hit admission test: provider returns MaxTokens twice. Operator sees [truncation_guard] admission, NOT the truncated text.
Empty-response retry regression guard: existing empty-retry path still works after the truncation_retried field was added.
Operator: trigger a real truncation in production (ask for a long synthesis with current max_tokens settings) and confirm the admission shows up cleanly in the chat instead of half a sentence.

Notes for review

The end-to-end tests discovered that ensure_alternation coalesces consecutive same-role messages. The retry nudge is User-role; the original input is User-role; they get merged. So the test discriminator can't use list.length(req.messages) — instead it matches the unique sentinel string from the nudge text. If anyone rewrites that prose in handle_think_complete, the test sentinel constants in truncation_guard_test.gleam need updating in the same change.
Out of scope (per the plan): sub-agent truncation (writer hits its own cap), cycle-stalled watchdog, cycle heartbeat. The deferred async-boundary audit doc remains in docs/roadmap/planned/.
The [truncation_guard] prefix on the admission is operator-facing and load-bearing. There's a test pinning it (admission_starts_with_operator_facing_prefix_test) so it can't be silently renamed.

🤖 Generated with Claude Code

When an LLM call returns stop_reason=MaxTokens with no tool calls, the cog loop previously logged a warning and shipped the truncated mid-sentence text as the cycle's reply. The 2026-04-26 incident made the failure mode visible: an operator asked for a comparative analysis of two long documents, the writer hit max_tokens twice, the orchestrator decided to write directly and hit its own cap, and the cycle terminated with "## Springdrift × The Synthetic" as the operator-visible reply. The agent looked frozen; in fact it was Idle waiting for input. This commit promotes that detection from a passive warning to a control-flow signal: - New truncation_retried: Bool field on PendingThink mirrors the existing empty_retried flag. Both prevent infinite-loop on the same failure mode. - handle_think_complete's "no tool calls" branch now treats MaxTokens as a recoverable failure. First hit retries once with a scope-down nudge ("decompose into multiple turns OR tighten scope"). Second hit ships a deterministic admission instead of the truncated text. - New output.build_truncation_admission helper produces the operator-facing reply. Pure function — no LLM, no I/O — so it cannot itself be truncated. The "[truncation_guard]" prefix lets operators recognise the failure mode at a glance, and the admission carries model name, output_tokens vs limit, tools fired, and three actionable recovery suggestions. 9 tests in test/agent/cognitive/truncation_guard_test.gleam: - 5 pure tests on build_truncation_admission shape - 4 end-to-end tests driving the cog loop with mock providers configured to return MaxTokens on a controlled schedule - Discovery during testing: ensure_alternation coalesces same-role messages, so the User-role retry nudge appended to a User-role original input ends up merged into a single message block. The test discriminator switched from list.length(req.messages) to content matching on the nudge sentinel string. Documentation: - docs/roadmap/planned/cog-loop-truncation-guard.md → moved to docs/roadmap/implemented/ with a "What Shipped" preamble. The original plan is preserved below the preamble for context. 2049 tests passing. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

seamus-brady merged commit a78866c into main Apr 26, 2026
1 check passed

seamus-brady mentioned this pull request Apr 26, 2026

Sub-agent truncation guard with embedded partial output #166

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement cog-loop truncation guard#165

Implement cog-loop truncation guard#165
seamus-brady merged 1 commit intomainfrom
feat/cog-loop-truncation-guard

seamus-brady commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seamus-brady commented Apr 26, 2026

Summary

What changed

Test plan

Notes for review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant