Skip to content

Sub-agent truncation guard with embedded partial output#166

Merged
seamus-brady merged 1 commit intomainfrom
feat/sub-agent-truncation-guard
Apr 26, 2026
Merged

Sub-agent truncation guard with embedded partial output#166
seamus-brady merged 1 commit intomainfrom
feat/sub-agent-truncation-guard

Conversation

@seamus-brady
Copy link
Copy Markdown
Owner

Summary

PR 1 of the sub-agent-resilience plan. Lifts the cog-loop truncation guard pattern (PR #165) into the framework's react loop so specialist agents (writer, researcher, coder, etc.) recover from max_tokens hits the same way the cog loop already does.

When an agent's LLM returns stop_reason=MaxTokens with no tool calls, the framework now retries once with a scope-down nudge (without burning a turn). On the second hit it ships a deterministic admission that embeds the agent's accumulated partial work, so the orchestrator and operator can pick up what was produced even though the synthesis was capped.

Also includes the docs/roadmap/planned/sub-agent-resilience.md planning doc which covers all five fixes; this PR implements Fixes 1 + 2.

What changed

  • src/agent/framework.gleamtruncation_retried: Bool field on ReactStats. Branch in do_react detects the failure mode, retries with nudge or ships admission. New helpers build_truncation_admission (pure, public for tests) and collect_assistant_text (extracts the agent's partial work across all turns of the failing react loop).
  • test/agent/framework_truncation_guard_test.gleam — 10 new tests: 6 pure on the admission shape, 4 end-to-end driving the framework with mock providers.
  • test/agent/framework_test.gleam — updated agent_success_surfaces_truncation_test to match the new contract (admission shipped instead of raw truncated text).
  • docs/roadmap/planned/sub-agent-resilience.md — full plan covering Fixes 1-5 (this PR is Fixes 1 + 2; PRs 2 + 3 to follow).

Design deviation from the plan

The plan called for writing the agent's partial work to a separate artifact via the existing artifacts subsystem. This PR instead embeds the partial output in the admission text (with size cap and head/tail elision when over 4KB). Reasons:

  1. Keeps the framework decoupled from artifact infrastructure — no new AgentSpec fields, no threading Subject(LibrarianMessage) through start_agent. The 15+ AgentSpec(...) construction sites stay unchanged.
  2. The orchestrator's LLM sees the admission as a tool-result and can decide whether to call store_result on the partial work itself — natural Springdrift pattern.
  3. Operator can read the partial work directly in the chat without retrieving an artifact.

If future evidence shows embedding isn't enough (e.g. partial work routinely exceeds 4KB and operators want full persistence), a follow-up PR can add framework-side artifact writing on top of this.

Test plan

  • gleam build clean, no warnings
  • gleam test — 2059 passed (10 new), no failures
  • Pure tests prove admission contract: [truncation_guard:<agent>] prefix, agent + model + tokens embedded, partial verbatim when short, elided when long, empty-partial fallback, recovery suggestions present
  • End-to-end retry-success test: provider returns MaxTokens then clean text. Agent returns AgentSuccess with the recovered text, NOT the truncated one and NOT the admission.
  • End-to-end second-hit admission test: provider returns MaxTokens twice. Agent returns AgentSuccess whose result starts with [truncation_guard:writer] and embeds the partial work.
  • truncated: True flag still surfaces on the admission so orchestrators that check the flag still know the cycle was capped.
  • max_turns=1 retry test: the truncation retry does NOT burn a turn — agent with max_turns=1 still produces output on the recovery turn after one MaxTokens hit.
  • Operator: rebuild + restart agent, dispatch a synthesis task that previously truncated, confirm either retry succeeds or [truncation_guard:writer] admission shows up cleanly in the chat instead of half a sentence.

What's next

Per the planning doc:

  • PR 2 — Fix 3 (referenced_artifacts parameter on agent_* tool calls so children inherit prior structural work) + Fix 4 (checkpoint tool + skill discipline updates for writer/researcher).
  • PR 3 — Fix 5 (codify Nemo's emergent strategies as orchestration skills + Strategy Registry seeding so every fresh instance has the floor strategies at boot).

🤖 Generated with Claude Code

When a specialist agent's react loop receives an LLM response with
stop_reason=MaxTokens and no tool calls, the framework previously
returned the truncated mid-sentence text to the orchestrator with
just a `truncated: True` flag. The orchestrator had to figure out
recovery on its own, and 14 of 14 sub-agent delegations in a
2026-04-26 Nemo session were capped this way without any agent-side
recovery — partial work was returned half-finished and the
orchestrator burned cycles trying alternative strategies.

This implements PR 1 of the sub-agent-resilience plan
(docs/roadmap/planned/sub-agent-resilience.md), Fixes 1 + 2:

Fix 1 — sub-agent truncation guard
- New `truncation_retried: Bool` field on ReactStats mirrors the
  cog-loop's `empty_retried` / `truncation_retried` pattern.
- First MaxTokens hit with no tool calls: append the previous
  truncated assistant response + a User-role scope-down nudge to
  the message history, recurse with the SAME `remaining` value so
  the retry does NOT consume one of the agent's allowed turns.
  Otherwise a single MaxTokens hit eats two turns and leaves the
  agent worse off than today.
- Second MaxTokens hit in the same react loop: ship a deterministic
  admission via `framework.build_truncation_admission` instead of
  returning the truncated text.

Fix 2 — auto-save partial output, embedded in the admission
- The admission carries the agent's accumulated text (across all
  turns of the failing react loop, not just the final response) so
  the orchestrator and operator can see what was produced.
- Admissions over a configurable size cap (4000 chars) elide the
  middle and keep head + tail with a "[...truncation_guard:
  N chars elided...]" marker — the admission itself stays
  manageable while still giving operators the bracketing context.
- Design deviation from the plan: the plan called for writing
  partial work to a separate artifact via the artifacts subsystem.
  Embedding in the admission text keeps the framework decoupled
  from artifact infrastructure (no new AgentSpec fields, no
  threading librarian Subjects through start_agent), and the
  orchestrator's LLM can decide to call store_result on the
  admission content if it wants persistence. Future PR can lift
  to artifact-write if/when evidence shows embedding isn't
  enough.

The `[truncation_guard:<agent>]` prefix on the admission is
load-bearing — operators and orchestrators recognise the failure
mode by it. Mirrors the cog-loop guard's `[truncation_guard]`
convention from PR #165.

Tests:
- 6 pure tests on `build_truncation_admission` shape: prefix,
  agent/model/tokens, partial verbatim short, elided-when-long,
  empty-partial fallback, recovery-suggestions present.
- 4 end-to-end tests driving real agents via framework.start_agent
  with mock providers: retry-success, second-hit-admission,
  admission-still-flags-truncated, retry-doesn't-burn-max_turns
  (max_turns=1 with truncation retry still produces output on
  the recovery turn).
- Updated existing `agent_success_surfaces_truncation_test` —
  now asserts the admission contract instead of the old
  pass-through truncated text.

2059 tests passing total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@seamus-brady seamus-brady merged commit 1d34ea4 into main Apr 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant