Agent framework Phase A: SearchAgentState + modular search components#324
Merged
Conversation
Spec-driven groundwork for optimizing the agent framework (modular components + GRPO action policy): - SPEC.md, plan, and task breakdown under docs/superpowers/ - SearchAgentState: six-field search-loop state (question, previous_queries, retrieved_docs, evidence_score, search_rounds, citations) + Retriever enum and Citation, added alongside the existing orchestration AgentState - 9 unit tests (dedup, round counting, rerank, evidence clamp) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
….2, T-A.3) Behavior-preserving, additive component modules under src/agents/components/, each unit-tested in isolation with injected deps: - EvidenceJudge: wraps SearchResultEvaluator -> continuous evidence_score in [0,1] (blends query sufficiency with squashed top scores; monotonic in quality) - AnswerGenerator: resolves [RxQyDz] markers to structured Citations via AgentContext - SearchTool: single-retriever wrapper that records the round into SearchAgentState - RerankerTool: reorders retrieved_docs via an injected rerank fn (no round counted) Also align SearchAgentState.retrieved_docs to the loop's native SearchResult type (lazy TYPE_CHECKING annotation; no runtime import cycle). 13 component tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…hase B) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This was referenced Jun 23, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What & why
First slice of the agent-framework GRPO optimization (spec-driven). Introduces the foundation the later GRPO action-policy work builds on: a clean six-field
SearchAgentStateand four explicit, single-responsibility components extracted from the implicit loop logic.Spec: SPEC.md · Plan: docs/superpowers/plans/agent-framework-grpo-plan.md · Tasks: docs/superpowers/plans/agent-framework-grpo-tasks.md
Changes (all additive — the live loop is untouched)
SearchAgentState(src/agents/state.py): the six canonical fields —question,previous_queries,retrieved_docs,evidence_score,search_rounds,citations— plusRetrieverenum (WEB/VECTOR_DB) andCitation. Named distinctly so it does not clobber the pre-existing orchestrationAgentState; reuses the loop's nativeSearchResultdoc type.EvidenceJudge: wraps the heuristicSearchResultEvaluatorand maps its verdict to a continuousevidence_score ∈ [0,1](blends query sufficiency with squashed top scores; monotonic in quality). Boolean sufficiency preserved as the safety rail.AnswerGenerator: resolves inline[RxQyDz]citation markers to structuredCitations viaAgentContext.SearchTool/RerankerTool: dependency-injected wrappers (single retriever / reorder-in-place). Concrete web/vdb + cross-encoder backends are wired in Phase B where the new actions need them.Scope decision
T-A.4 (wiring components into
SearchAgentLoop) is deferred into Phase B: the loop's retrieval is batch/multi-query with caching + dedup, so a no-op rewrite now would be high-churn/zero-payoff. Phase B touches that path anyway for web/vdb routing — wire once, when it changes behavior.Testing
tests/unit/test_agent_state.py(9),tests/unit/test_components.py(13).🤖 Generated with Claude Code