Sub-agent truncation guard with embedded partial output#166
Merged
seamus-brady merged 1 commit intomainfrom Apr 26, 2026
Merged
Sub-agent truncation guard with embedded partial output#166seamus-brady merged 1 commit intomainfrom
seamus-brady merged 1 commit intomainfrom
Conversation
When a specialist agent's react loop receives an LLM response with stop_reason=MaxTokens and no tool calls, the framework previously returned the truncated mid-sentence text to the orchestrator with just a `truncated: True` flag. The orchestrator had to figure out recovery on its own, and 14 of 14 sub-agent delegations in a 2026-04-26 Nemo session were capped this way without any agent-side recovery — partial work was returned half-finished and the orchestrator burned cycles trying alternative strategies. This implements PR 1 of the sub-agent-resilience plan (docs/roadmap/planned/sub-agent-resilience.md), Fixes 1 + 2: Fix 1 — sub-agent truncation guard - New `truncation_retried: Bool` field on ReactStats mirrors the cog-loop's `empty_retried` / `truncation_retried` pattern. - First MaxTokens hit with no tool calls: append the previous truncated assistant response + a User-role scope-down nudge to the message history, recurse with the SAME `remaining` value so the retry does NOT consume one of the agent's allowed turns. Otherwise a single MaxTokens hit eats two turns and leaves the agent worse off than today. - Second MaxTokens hit in the same react loop: ship a deterministic admission via `framework.build_truncation_admission` instead of returning the truncated text. Fix 2 — auto-save partial output, embedded in the admission - The admission carries the agent's accumulated text (across all turns of the failing react loop, not just the final response) so the orchestrator and operator can see what was produced. - Admissions over a configurable size cap (4000 chars) elide the middle and keep head + tail with a "[...truncation_guard: N chars elided...]" marker — the admission itself stays manageable while still giving operators the bracketing context. - Design deviation from the plan: the plan called for writing partial work to a separate artifact via the artifacts subsystem. Embedding in the admission text keeps the framework decoupled from artifact infrastructure (no new AgentSpec fields, no threading librarian Subjects through start_agent), and the orchestrator's LLM can decide to call store_result on the admission content if it wants persistence. Future PR can lift to artifact-write if/when evidence shows embedding isn't enough. The `[truncation_guard:<agent>]` prefix on the admission is load-bearing — operators and orchestrators recognise the failure mode by it. Mirrors the cog-loop guard's `[truncation_guard]` convention from PR #165. Tests: - 6 pure tests on `build_truncation_admission` shape: prefix, agent/model/tokens, partial verbatim short, elided-when-long, empty-partial fallback, recovery-suggestions present. - 4 end-to-end tests driving real agents via framework.start_agent with mock providers: retry-success, second-hit-admission, admission-still-flags-truncated, retry-doesn't-burn-max_turns (max_turns=1 with truncation retry still produces output on the recovery turn). - Updated existing `agent_success_surfaces_truncation_test` — now asserts the admission contract instead of the old pass-through truncated text. 2059 tests passing total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 1 of the sub-agent-resilience plan. Lifts the cog-loop truncation guard pattern (PR #165) into the framework's react loop so specialist agents (writer, researcher, coder, etc.) recover from
max_tokenshits the same way the cog loop already does.When an agent's LLM returns
stop_reason=MaxTokenswith no tool calls, the framework now retries once with a scope-down nudge (without burning a turn). On the second hit it ships a deterministic admission that embeds the agent's accumulated partial work, so the orchestrator and operator can pick up what was produced even though the synthesis was capped.Also includes the
docs/roadmap/planned/sub-agent-resilience.mdplanning doc which covers all five fixes; this PR implements Fixes 1 + 2.What changed
src/agent/framework.gleam—truncation_retried: Boolfield onReactStats. Branch indo_reactdetects the failure mode, retries with nudge or ships admission. New helpersbuild_truncation_admission(pure, public for tests) andcollect_assistant_text(extracts the agent's partial work across all turns of the failing react loop).test/agent/framework_truncation_guard_test.gleam— 10 new tests: 6 pure on the admission shape, 4 end-to-end driving the framework with mock providers.test/agent/framework_test.gleam— updatedagent_success_surfaces_truncation_testto match the new contract (admission shipped instead of raw truncated text).docs/roadmap/planned/sub-agent-resilience.md— full plan covering Fixes 1-5 (this PR is Fixes 1 + 2; PRs 2 + 3 to follow).Design deviation from the plan
The plan called for writing the agent's partial work to a separate artifact via the existing artifacts subsystem. This PR instead embeds the partial output in the admission text (with size cap and head/tail elision when over 4KB). Reasons:
AgentSpecfields, no threadingSubject(LibrarianMessage)throughstart_agent. The 15+AgentSpec(...)construction sites stay unchanged.store_resulton the partial work itself — natural Springdrift pattern.If future evidence shows embedding isn't enough (e.g. partial work routinely exceeds 4KB and operators want full persistence), a follow-up PR can add framework-side artifact writing on top of this.
Test plan
gleam buildclean, no warningsgleam test— 2059 passed (10 new), no failures[truncation_guard:<agent>]prefix, agent + model + tokens embedded, partial verbatim when short, elided when long, empty-partial fallback, recovery suggestions presentAgentSuccesswith the recovered text, NOT the truncated one and NOT the admission.AgentSuccesswhose result starts with[truncation_guard:writer]and embeds the partial work.truncated: Trueflag still surfaces on the admission so orchestrators that check the flag still know the cycle was capped.max_turns=1retry test: the truncation retry does NOT burn a turn — agent withmax_turns=1still produces output on the recovery turn after one MaxTokens hit.[truncation_guard:writer]admission shows up cleanly in the chat instead of half a sentence.What's next
Per the planning doc:
referenced_artifactsparameter onagent_*tool calls so children inherit prior structural work) + Fix 4 (checkpointtool + skill discipline updates for writer/researcher).🤖 Generated with Claude Code