Implement cog-loop truncation guard#165
Merged
seamus-brady merged 1 commit intomainfrom Apr 26, 2026
Merged
Conversation
When an LLM call returns stop_reason=MaxTokens with no tool calls,
the cog loop previously logged a warning and shipped the truncated
mid-sentence text as the cycle's reply. The 2026-04-26 incident
made the failure mode visible: an operator asked for a comparative
analysis of two long documents, the writer hit max_tokens twice,
the orchestrator decided to write directly and hit its own cap,
and the cycle terminated with "## Springdrift × The Synthetic" as
the operator-visible reply. The agent looked frozen; in fact it
was Idle waiting for input.
This commit promotes that detection from a passive warning to a
control-flow signal:
- New truncation_retried: Bool field on PendingThink mirrors the
existing empty_retried flag. Both prevent infinite-loop on the
same failure mode.
- handle_think_complete's "no tool calls" branch now treats
MaxTokens as a recoverable failure. First hit retries once with
a scope-down nudge ("decompose into multiple turns OR tighten
scope"). Second hit ships a deterministic admission instead of
the truncated text.
- New output.build_truncation_admission helper produces the
operator-facing reply. Pure function — no LLM, no I/O — so it
cannot itself be truncated. The "[truncation_guard]" prefix lets
operators recognise the failure mode at a glance, and the
admission carries model name, output_tokens vs limit, tools
fired, and three actionable recovery suggestions.
9 tests in test/agent/cognitive/truncation_guard_test.gleam:
- 5 pure tests on build_truncation_admission shape
- 4 end-to-end tests driving the cog loop with mock providers
configured to return MaxTokens on a controlled schedule
- Discovery during testing: ensure_alternation coalesces same-role
messages, so the User-role retry nudge appended to a User-role
original input ends up merged into a single message block. The
test discriminator switched from list.length(req.messages) to
content matching on the nudge sentinel string.
Documentation:
- docs/roadmap/planned/cog-loop-truncation-guard.md → moved to
docs/roadmap/implemented/ with a "What Shipped" preamble. The
original plan is preserved below the preamble for context.
2049 tests passing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the truncation guard plan from PR #164 (
docs/roadmap/planned/cog-loop-truncation-guard.md). When an LLM call returnsstop_reason=MaxTokenswith no tool calls, the cog loop now retries once with a scope-down nudge; on the second hit it ships a deterministic admission instead of the truncated text.Also moves the planning doc to
docs/roadmap/implemented/with a "What Shipped" preamble.What changed
PendingThink.truncation_retried: Bool— mirrors the existingempty_retriedflag. Tracked across the retry to stop infinite-loop on the same failure.handle_think_complete— new branch detects MaxTokens-with-no-tool-calls. First hit: retry with a scope-down nudge ("decompose into multiple turns OR tighten scope"). Second hit: ship the deterministic admission.output.build_truncation_admission— pure helper, no LLM, no I/O. Builds the operator-facing reply with[truncation_guard]prefix, model + output_tokens + limit, tools fired this cycle, and three actionable recovery suggestions.test/agent/cognitive/truncation_guard_test.gleam:docs/roadmap/planned/cog-loop-truncation-guard.md→docs/roadmap/implemented/cog-loop-truncation-guard.mdwith a "What Shipped" section above the original plan.Test plan
gleam buildclean, no warningsgleam test— 2049 passed (9 new), no failures[truncation_guard]admission, NOT the truncated text.max_tokenssettings) and confirm the admission shows up cleanly in the chat instead of half a sentence.Notes for review
ensure_alternationcoalesces consecutive same-role messages. The retry nudge is User-role; the original input is User-role; they get merged. So the test discriminator can't uselist.length(req.messages)— instead it matches the unique sentinel string from the nudge text. If anyone rewrites that prose inhandle_think_complete, the test sentinel constants intruncation_guard_test.gleamneed updating in the same change.docs/roadmap/planned/.[truncation_guard]prefix on the admission is operator-facing and load-bearing. There's a test pinning it (admission_starts_with_operator_facing_prefix_test) so it can't be silently renamed.🤖 Generated with Claude Code