Sub-agent truncation guard with embedded partial output by seamus-brady · Pull Request #166 · seamus-brady/springdrift

seamus-brady · 2026-04-26T16:52:04Z

Summary

PR 1 of the sub-agent-resilience plan. Lifts the cog-loop truncation guard pattern (PR #165) into the framework's react loop so specialist agents (writer, researcher, coder, etc.) recover from max_tokens hits the same way the cog loop already does.

When an agent's LLM returns stop_reason=MaxTokens with no tool calls, the framework now retries once with a scope-down nudge (without burning a turn). On the second hit it ships a deterministic admission that embeds the agent's accumulated partial work, so the orchestrator and operator can pick up what was produced even though the synthesis was capped.

Also includes the docs/roadmap/planned/sub-agent-resilience.md planning doc which covers all five fixes; this PR implements Fixes 1 + 2.

What changed

src/agent/framework.gleam — truncation_retried: Bool field on ReactStats. Branch in do_react detects the failure mode, retries with nudge or ships admission. New helpers build_truncation_admission (pure, public for tests) and collect_assistant_text (extracts the agent's partial work across all turns of the failing react loop).
test/agent/framework_truncation_guard_test.gleam — 10 new tests: 6 pure on the admission shape, 4 end-to-end driving the framework with mock providers.
test/agent/framework_test.gleam — updated agent_success_surfaces_truncation_test to match the new contract (admission shipped instead of raw truncated text).
docs/roadmap/planned/sub-agent-resilience.md — full plan covering Fixes 1-5 (this PR is Fixes 1 + 2; PRs 2 + 3 to follow).

Design deviation from the plan

The plan called for writing the agent's partial work to a separate artifact via the existing artifacts subsystem. This PR instead embeds the partial output in the admission text (with size cap and head/tail elision when over 4KB). Reasons:

Keeps the framework decoupled from artifact infrastructure — no new AgentSpec fields, no threading Subject(LibrarianMessage) through start_agent. The 15+ AgentSpec(...) construction sites stay unchanged.
The orchestrator's LLM sees the admission as a tool-result and can decide whether to call store_result on the partial work itself — natural Springdrift pattern.
Operator can read the partial work directly in the chat without retrieving an artifact.

If future evidence shows embedding isn't enough (e.g. partial work routinely exceeds 4KB and operators want full persistence), a follow-up PR can add framework-side artifact writing on top of this.

Test plan

gleam build clean, no warnings
gleam test — 2059 passed (10 new), no failures
Pure tests prove admission contract: [truncation_guard:<agent>] prefix, agent + model + tokens embedded, partial verbatim when short, elided when long, empty-partial fallback, recovery suggestions present
End-to-end retry-success test: provider returns MaxTokens then clean text. Agent returns AgentSuccess with the recovered text, NOT the truncated one and NOT the admission.
End-to-end second-hit admission test: provider returns MaxTokens twice. Agent returns AgentSuccess whose result starts with [truncation_guard:writer] and embeds the partial work.
truncated: True flag still surfaces on the admission so orchestrators that check the flag still know the cycle was capped.
max_turns=1 retry test: the truncation retry does NOT burn a turn — agent with max_turns=1 still produces output on the recovery turn after one MaxTokens hit.
Operator: rebuild + restart agent, dispatch a synthesis task that previously truncated, confirm either retry succeeds or [truncation_guard:writer] admission shows up cleanly in the chat instead of half a sentence.

What's next

Per the planning doc:

PR 2 — Fix 3 (referenced_artifacts parameter on agent_* tool calls so children inherit prior structural work) + Fix 4 (checkpoint tool + skill discipline updates for writer/researcher).
PR 3 — Fix 5 (codify Nemo's emergent strategies as orchestration skills + Strategy Registry seeding so every fresh instance has the floor strategies at boot).

🤖 Generated with Claude Code

When a specialist agent's react loop receives an LLM response with stop_reason=MaxTokens and no tool calls, the framework previously returned the truncated mid-sentence text to the orchestrator with just a `truncated: True` flag. The orchestrator had to figure out recovery on its own, and 14 of 14 sub-agent delegations in a 2026-04-26 Nemo session were capped this way without any agent-side recovery — partial work was returned half-finished and the orchestrator burned cycles trying alternative strategies. This implements PR 1 of the sub-agent-resilience plan (docs/roadmap/planned/sub-agent-resilience.md), Fixes 1 + 2: Fix 1 — sub-agent truncation guard - New `truncation_retried: Bool` field on ReactStats mirrors the cog-loop's `empty_retried` / `truncation_retried` pattern. - First MaxTokens hit with no tool calls: append the previous truncated assistant response + a User-role scope-down nudge to the message history, recurse with the SAME `remaining` value so the retry does NOT consume one of the agent's allowed turns. Otherwise a single MaxTokens hit eats two turns and leaves the agent worse off than today. - Second MaxTokens hit in the same react loop: ship a deterministic admission via `framework.build_truncation_admission` instead of returning the truncated text. Fix 2 — auto-save partial output, embedded in the admission - The admission carries the agent's accumulated text (across all turns of the failing react loop, not just the final response) so the orchestrator and operator can see what was produced. - Admissions over a configurable size cap (4000 chars) elide the middle and keep head + tail with a "[...truncation_guard: N chars elided...]" marker — the admission itself stays manageable while still giving operators the bracketing context. - Design deviation from the plan: the plan called for writing partial work to a separate artifact via the artifacts subsystem. Embedding in the admission text keeps the framework decoupled from artifact infrastructure (no new AgentSpec fields, no threading librarian Subjects through start_agent), and the orchestrator's LLM can decide to call store_result on the admission content if it wants persistence. Future PR can lift to artifact-write if/when evidence shows embedding isn't enough. The `[truncation_guard:<agent>]` prefix on the admission is load-bearing — operators and orchestrators recognise the failure mode by it. Mirrors the cog-loop guard's `[truncation_guard]` convention from PR #165. Tests: - 6 pure tests on `build_truncation_admission` shape: prefix, agent/model/tokens, partial verbatim short, elided-when-long, empty-partial fallback, recovery-suggestions present. - 4 end-to-end tests driving real agents via framework.start_agent with mock providers: retry-success, second-hit-admission, admission-still-flags-truncated, retry-doesn't-burn-max_turns (max_turns=1 with truncation retry still produces output on the recovery turn). - Updated existing `agent_success_surfaces_truncation_test` — now asserts the admission contract instead of the old pass-through truncated text. 2059 tests passing total. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

seamus-brady merged commit 1d34ea4 into main Apr 26, 2026
1 check passed

This was referenced Apr 26, 2026

Real-coder layer: OpenCode via ACP — full rebuild + auto-wire #168

Merged

Codify Nemo's strategies as skills with Strategy Registry seeding #169

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sub-agent truncation guard with embedded partial output#166

Sub-agent truncation guard with embedded partial output#166
seamus-brady merged 1 commit intomainfrom
feat/sub-agent-truncation-guard

seamus-brady commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

seamus-brady commented Apr 26, 2026

Summary

What changed

Design deviation from the plan

Test plan

What's next

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant