Agent framework Phase A: SearchAgentState + modular search components by lingduoduo · Pull Request #324 · lingduoduo/Agentic-Search-GRPO

lingduoduo · 2026-06-23T23:20:22Z

What & why

First slice of the agent-framework GRPO optimization (spec-driven). Introduces the foundation the later GRPO action-policy work builds on: a clean six-field SearchAgentState and four explicit, single-responsibility components extracted from the implicit loop logic.

Spec: SPEC.md · Plan: docs/superpowers/plans/agent-framework-grpo-plan.md · Tasks: docs/superpowers/plans/agent-framework-grpo-tasks.md

Changes (all additive — the live loop is untouched)

SearchAgentState (src/agents/state.py): the six canonical fields — question, previous_queries, retrieved_docs, evidence_score, search_rounds, citations — plus Retriever enum (WEB/VECTOR_DB) and Citation. Named distinctly so it does not clobber the pre-existing orchestration AgentState; reuses the loop's native SearchResult doc type.
EvidenceJudge: wraps the heuristic SearchResultEvaluator and maps its verdict to a continuous evidence_score ∈ [0,1] (blends query sufficiency with squashed top scores; monotonic in quality). Boolean sufficiency preserved as the safety rail.
AnswerGenerator: resolves inline [RxQyDz] citation markers to structured Citations via AgentContext.
SearchTool / RerankerTool: dependency-injected wrappers (single retriever / reorder-in-place). Concrete web/vdb + cross-encoder backends are wired in Phase B where the new actions need them.

Scope decision

T-A.4 (wiring components into SearchAgentLoop) is deferred into Phase B: the loop's retrieval is batch/multi-query with caching + dedup, so a no-op rewrite now would be high-churn/zero-payoff. Phase B touches that path anyway for web/vdb routing — wire once, when it changes behavior.

Testing

TDD throughout (RED→GREEN). 22 new tests: tests/unit/test_agent_state.py (9), tests/unit/test_components.py (13).
162 tests green across agents + training + new modules; no regression; ruff clean.

🤖 Generated with Claude Code

Spec-driven groundwork for optimizing the agent framework (modular components + GRPO action policy): - SPEC.md, plan, and task breakdown under docs/superpowers/ - SearchAgentState: six-field search-loop state (question, previous_queries, retrieved_docs, evidence_score, search_rounds, citations) + Retriever enum and Citation, added alongside the existing orchestration AgentState - 9 unit tests (dedup, round counting, rerank, evidence clamp) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

….2, T-A.3) Behavior-preserving, additive component modules under src/agents/components/, each unit-tested in isolation with injected deps: - EvidenceJudge: wraps SearchResultEvaluator -> continuous evidence_score in [0,1] (blends query sufficiency with squashed top scores; monotonic in quality) - AnswerGenerator: resolves [RxQyDz] markers to structured Citations via AgentContext - SearchTool: single-retriever wrapper that records the round into SearchAgentState - RerankerTool: reorders retrieved_docs via an injected rerank fn (no round counted) Also align SearchAgentState.retrieved_docs to the loop's native SearchResult type (lazy TYPE_CHECKING annotation; no runtime import cycle). 13 component tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…hase B) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ion) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lingduoduo and others added 5 commits June 23, 2026 19:01

Record T-A.4 wiring decision in task plan (recommend deferring into P…

d514e68

…hase B) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Mark Checkpoint 1 reached; revise PR slicing after T-A.4 deferral

412f54c

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Add spec copy under docs/superpowers/specs/ (per-PR spec+plan convent…

63d2fcb

…ion) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lingduoduo merged commit 238f73d into main Jun 23, 2026
5 of 6 checks passed

This was referenced Jun 23, 2026

Agent framework Phase B: web-vs-vector-DB retriever action (Planner + live loop) #325

Merged

Agent framework Phase A0: durable GRPO training loop (checkpoint/resume + bounded concurrency) #326

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent framework Phase A: SearchAgentState + modular search components#324

Agent framework Phase A: SearchAgentState + modular search components#324
lingduoduo merged 5 commits into
mainfrom
feat/agent-framework-grpo-optimization

lingduoduo commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

lingduoduo commented Jun 23, 2026

What & why

Changes (all additive — the live loop is untouched)

Scope decision

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant