Skip to content

Agent framework Phase A: SearchAgentState + modular search components#324

Merged
lingduoduo merged 5 commits into
mainfrom
feat/agent-framework-grpo-optimization
Jun 23, 2026
Merged

Agent framework Phase A: SearchAgentState + modular search components#324
lingduoduo merged 5 commits into
mainfrom
feat/agent-framework-grpo-optimization

Conversation

@lingduoduo

Copy link
Copy Markdown
Owner

What & why

First slice of the agent-framework GRPO optimization (spec-driven). Introduces the foundation the later GRPO action-policy work builds on: a clean six-field SearchAgentState and four explicit, single-responsibility components extracted from the implicit loop logic.

Spec: SPEC.md · Plan: docs/superpowers/plans/agent-framework-grpo-plan.md · Tasks: docs/superpowers/plans/agent-framework-grpo-tasks.md

Changes (all additive — the live loop is untouched)

  • SearchAgentState (src/agents/state.py): the six canonical fields — question, previous_queries, retrieved_docs, evidence_score, search_rounds, citations — plus Retriever enum (WEB/VECTOR_DB) and Citation. Named distinctly so it does not clobber the pre-existing orchestration AgentState; reuses the loop's native SearchResult doc type.
  • EvidenceJudge: wraps the heuristic SearchResultEvaluator and maps its verdict to a continuous evidence_score ∈ [0,1] (blends query sufficiency with squashed top scores; monotonic in quality). Boolean sufficiency preserved as the safety rail.
  • AnswerGenerator: resolves inline [RxQyDz] citation markers to structured Citations via AgentContext.
  • SearchTool / RerankerTool: dependency-injected wrappers (single retriever / reorder-in-place). Concrete web/vdb + cross-encoder backends are wired in Phase B where the new actions need them.

Scope decision

T-A.4 (wiring components into SearchAgentLoop) is deferred into Phase B: the loop's retrieval is batch/multi-query with caching + dedup, so a no-op rewrite now would be high-churn/zero-payoff. Phase B touches that path anyway for web/vdb routing — wire once, when it changes behavior.

Testing

  • TDD throughout (RED→GREEN). 22 new tests: tests/unit/test_agent_state.py (9), tests/unit/test_components.py (13).
  • 162 tests green across agents + training + new modules; no regression; ruff clean.

🤖 Generated with Claude Code

lingduoduo and others added 5 commits June 23, 2026 19:01
Spec-driven groundwork for optimizing the agent framework (modular
components + GRPO action policy):
- SPEC.md, plan, and task breakdown under docs/superpowers/
- SearchAgentState: six-field search-loop state (question, previous_queries,
  retrieved_docs, evidence_score, search_rounds, citations) + Retriever enum
  and Citation, added alongside the existing orchestration AgentState
- 9 unit tests (dedup, round counting, rerank, evidence clamp)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
….2, T-A.3)

Behavior-preserving, additive component modules under src/agents/components/,
each unit-tested in isolation with injected deps:
- EvidenceJudge: wraps SearchResultEvaluator -> continuous evidence_score in [0,1]
  (blends query sufficiency with squashed top scores; monotonic in quality)
- AnswerGenerator: resolves [RxQyDz] markers to structured Citations via AgentContext
- SearchTool: single-retriever wrapper that records the round into SearchAgentState
- RerankerTool: reorders retrieved_docs via an injected rerank fn (no round counted)

Also align SearchAgentState.retrieved_docs to the loop's native SearchResult type
(lazy TYPE_CHECKING annotation; no runtime import cycle). 13 component tests.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hase B)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lingduoduo lingduoduo merged commit 238f73d into main Jun 23, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant