You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Finish the Claude Agent SDK migration that was partially undone by merging upstream PR #23 (which expanded anthropic usage for rate-limit handling). End state: zero import anthropic anywhere in libs/openant-core, anthropic removed from pyproject.toml.
Steps 1–7: code complete. PR #38 is the final dep drop; once merged, the fork has zero anthropic usage. Step 7 also caught one straggler site in openant/cli.py (cmd_report_data's remediation-guidance LLM call) that the earlier ports did not touch — the dep-drift regression test from PR #30 surfaced it when the dep list shrank.
Fork PR #25 (2026-03-23) completed the SDK migration against the fork's then-current
code. Zero anthropic references remained in any of the four files immediately after
that commit.
Upstream then independently merged their own PR #23 (2026-04-14) which, among other
things, added GlobalRateLimiter + parallel execution and expandedanthropic
usage with typed rate-limit handling. When the fork merged upstream/master (fork PR #29,
2026-04-16), the merge absorbed upstream's anthropic-based rate-limit code without
re-porting to the SDK. That's the state we need to clean up now.
Feasibility answer (resolves the blocking decision from the previous draft)
The SDK does surface rate limits — via AssistantMessage.error: AssistantMessageError | None
where AssistantMessageError is a typed literal that includes "rate_limit"
(see claude_agent_sdk/types.py:767-774). The other values are "authentication_failed", "billing_error", "invalid_request", "server_error", "unknown".
Trade-off vs. the old anthropic.RateLimitError path:
anthropic.RateLimitError
SDK AssistantMessage.error == "rate_limit"
Detection
Typed exception
Inspect message field
retry-after header
Yes, exact value
No — not surfaced
request-id
Yes
No
Other API errors
Separate exception types
Other AssistantMessageError values
The retry-after loss is tolerable: GlobalRateLimiter's default backoff is already
30s (configurable), and the header was only ever an upper-bound hint. Everything else
is either cleanly equivalent or finer-grained in the SDK (the SDK distinguishes "billing_error" and "authentication_failed" which anthropic bundles into APIStatusError).
Conclusion: full migration is feasible. No stderr parsing required. Earlier drafts
of this plan hedged on this; that was wrong.
Success criteria
No import anthropic anywhere under libs/openant-core/.
anthropic removed from pyproject.toml dependencies.
Line 274: self.client = client or anthropic.Anthropic(max_retries=5) — primary client, live.
Line 336: response = self.client.messages.create(model=VERIFIER_MODEL, tools=VERIFICATION_TOOLS, messages=...) — main Stage 2 verification loop, using manual tool dispatch (VERIFICATION_TOOLS + ToolExecutor).
Line 343: except anthropic.RateLimitError as exc: get_rate_limiter().report_rate_limit(retry_after); raise.
Line 862: second rate-limit handler in another method.
utilities/agentic_enhancer/agent.py
Line 109: client: Optional[anthropic.Anthropic] = None — constructor param, marked as "Shared Anthropic client (reuse across workers to avoid FD exhaustion)".
Line 129: self.client = client or anthropic.Anthropic(max_retries=5).
Line 196: response = self.client.messages.create(model=AGENT_MODEL, tools=TOOL_DEFINITIONS, messages=...) — main enhancement loop, manual tool dispatch.
Line 203: except anthropic.RateLimitError — reports to rate limiter, attaches agent_state to exception, re-raises.
Line 382: another function with client: Optional[anthropic.Anthropic] = None param.
utilities/context_enhancer.py
Line 26: import anthropic.
Lines 65-77: exception classifier classify_error(exc) — returns a dict with type ∈ {connection, timeout, rate_limit, api_status} by isinstance checks, plus status_code, request_id, retry_after extracted from exc.response.headers. Used for diagnostic logging in error reports.
Line 573: shared_client = anthropic.Anthropic(max_retries=5) passed to parallel ContextAgent workers.
report/generator.py
Line 10: import anthropic.
Line 138: client = anthropic.Anthropic() then client.messages.create(model=MODEL, ...) in generate_summary_report.
Line 161: same pattern in generate_disclosure.
Migration blueprint (per call site)
The good news: utilities/llm_client.py already has every SDK primitive these sites
need. We're not designing new abstractions — we're routing existing call sites through
them.
_run_query_sync(prompt, options) with _build_options(model=M, allowed_tools=["Read","Grep","Glob","Bash"], add_dirs=[repo_path]).
client.messages.create(model=M, system=S, messages=[...]) (single-turn, no tools)
create_message(prompt, model=M, system=S) or AnthropicClient(model=M).analyze_sync(prompt) for tracked calls.
Step 1 — Error taxonomy
Create utilities/sdk_errors.py:
OpenAntLLMError(Exception) — base.
RateLimitError(OpenAntLLMError) — for AssistantMessage.error == "rate_limit".
BillingError(OpenAntLLMError), AuthError(OpenAntLLMError), APIStatusError(OpenAntLLMError) — one per AssistantMessageError value.
classify_error(exc: OpenAntLLMError) -> dict — returns the shape context_enhancer.py currently builds (type, exception_class, message, status_code) so diagnostic logging keeps working. Drops request_id and retry_after (SDK doesn't surface them).
Step 2 — Wire SDK error surfacing
In utilities/llm_client.py:
Inside _run_query message loop (around line 128), when receiving an AssistantMessage, check message.error.
If error == "rate_limit": call get_rate_limiter().report_rate_limit(0) (no retry-after signal — let default backoff apply) and raise sdk_errors.RateLimitError.
Other error values raise the corresponding sdk_errors.* class.
This centralises rate-limit detection in one place — individual callers no longer need
their own except anthropic.RateLimitError blocks.
Step 3 — Port finding_verifier.py
Delete self.client, client constructor param, and the whole while iterations < MAX_ITERATIONS: self.client.messages.create(tools=VERIFICATION_TOOLS, ...) loop.
Replace with a single run_native_verification(prompt, system_prompt, VERIFIER_MODEL, repo_path, json_schema=VERIFICATION_SCHEMA) call.
Remove except anthropic.RateLimitError at 343 and 862 — rate-limit handling is now centralised in _run_query (step 2). If the caller still wants to re-raise with state attached, catch sdk_errors.RateLimitError instead.
Reference: PR feat: migrate all LLM calls to Claude Agent SDK #25's 73a01a0 diff already did this work. It's deleted from current code but recoverable from git show 73a01a0 -- libs/openant-core/utilities/finding_verifier.py.
Step 4 — Port agentic_enhancer/agent.py
Same shape as step 3: delete self.client and its constructor param, replace the while iterations: self.client.messages.create(tools=TOOL_DEFINITIONS, ...) loop with _run_query_sync(prompt, options) where options = _build_options(model=AGENT_MODEL, allowed_tools=["Read","Grep","Glob","Bash"], add_dirs=[repo_path]).
Update the raise at line 203 to catch sdk_errors.RateLimitError and attach agent_state before re-raising (same pattern, different exception class).
classify_error (lines 65-77): swap the isinstance(exc, anthropic.*) chain for checks against sdk_errors.* classes. Keep the returned dict shape so callers (diagnostic logging) don't change.
shared_client at line 573: delete. With _run_query_sync there's no shared client to pre-construct — each call spins up a fresh ClaudeSDKClient context manager. If there's a real FD-exhaustion concern under high concurrency, it needs to be re-proven post-migration (the fear behind the comment came from the anthropic SDK's connection pool).
Step 6 — Port report/generator.py
Trivial. Replace client = anthropic.Anthropic(); response = client.messages.create(...) at lines 138 and 161 with text = create_message(prompt, model=MODEL, system=system_prompt).
Usage dict extraction (_extract_usage(response)) currently relies on response.usage shape. Need to return the SDK ResultMessage's usage dict instead — create_message today returns only text; it needs a sibling that returns (text, usage) or the report generator needs to use AnthropicClient which already tracks usage.
Step 7 — Drop anthropic from pyproject.toml
Once steps 3-6 land, grep confirms no import anthropic remains, and the smoke test
passes (clean venv install → import openant, core, utilities, parsers, prompts, context, report).
Then remove "anthropic>=0.40.0" from dependencies.
Step 8 — End-to-end verification (do not skip)
The three manual smoke tests from PR #25's original test plan (which went unchecked):
openant enhance --fresh --workers 2 against a small test repo with a real API key.
openant verify on a real finding with a real API key.
openant analyze with a real API key.
openant report --format html against a completed scan.
openant generate-context <repo> against a real repo.
All of the above with OPENANT_LOCAL_CLAUDE=true (local session auth path).
Force a rate limit: run enhance at --workers 20 briefly and confirm GlobalRateLimiter.report_rate_limit fires, all workers pause, execution resumes.
Step 9 — Submit upstream
Open an upstream PR with a conventional-commits subject. Do not bundle with the GlobalRateLimiter work — that's already upstream; we're just porting its integration.
VERIFICATION_TOOLS / TOOL_DEFINITIONS were custom tool schemas for anthropic's tool-use API. SDK native tools (Read/Grep/Glob/Bash) cover the same surface but with different ergonomics. Re-read PR feat: migrate all LLM calls to Claude Agent SDK #25's prompt updates in prompts/verification_prompts.py and utilities/agentic_enhancer/prompts.py — they coach the model on the native tools. Those prompt changes must come along with the code changes or the model will fumble tool calls.
VERIFICATION_SCHEMA JSON schema for structured output: PR feat: migrate all LLM calls to Claude Agent SDK #25 added one in prompts/verification_prompts.py. Check that it still exists in current code, or re-add it.
Prompt drift: the model's tool-calling behaviour changes between the custom tool schema and SDK native tools. Expected: PR feat: migrate all LLM calls to Claude Agent SDK #25's prompt-update patterns work. Unexpected: native tools produce different verdicts on some units. Mitigation: run a before/after comparison on a reference dataset (the fork has geospatial_vuln12, flowise_vuln4, object_browser).
Lost retry-after: GlobalRateLimiter will use default backoff instead of API-provided. In practice the API's values are usually ≤30s so default backoff is already conservative. If this becomes a problem, the SDK would need to surface the header (file an issue upstream at claude_agent_sdk).
FD exhaustion concern in ContextEnhancer: the shared_client comment implies this was observed under parallel load with anthropic. The SDK's per-call ClaudeSDKClient context manager is a different mechanism (subprocess spawn, not HTTP connection pool). Needs load-testing — not a reason to keep anthropic, but a reason to test at high --workers.
AssistantMessage.error timing: untested assumption that the SDK delivers an AssistantMessage with error="rate_limit"before the ResultMessage. If it only appears inside ResultMessage.result as an error string, the detection point moves. Needs a live test against a real rate limit.
Step 0b — recover PR feat: migrate all LLM calls to Claude Agent SDK #25's prompt updates via git show 73a01a0 -- libs/openant-core/prompts/verification_prompts.py libs/openant-core/utilities/agentic_enhancer/prompts.py. Stage on a prep branch. Needed by Steps 3 and 4.
Step 1 — write utilities/sdk_errors.py. Pure new module, touches nothing existing.
Step 6 — port report/generator.py. Uses create_message which already exists; doesn't need the new taxonomy (single-turn, no rate-limit loop). Can land before Step 1.
Wave 2 — after Step 1 merges:
Step 2 — wire AssistantMessage.error detection into _run_query. Blocks Steps 3 and 4.
Step 5a — rewrite classify_error against the new taxonomy. Independent of Step 2. Runs in parallel.
Wave 3 — after Steps 1 + 2 merge:
Step 3 — port finding_verifier.py.
Step 4 — port agentic_enhancer/agent.py.
Touch disjoint files, no cross-dependency, merge in either order.
Wave 4 — after Step 4 merges:
Step 5b — delete shared_client in context_enhancer.py. Requires ContextAgent already off self.client.
Three workers: ~3.5 days. Worker C takes Step 5a + the reference-dataset before/after comparison (prompt-drift risk mitigation) + reviews.
Four+ workers: diminishing returns — merge conflicts on llm_client.py and pyproject.toml eat the savings.
Implementation via git worktrees
Per CLAUDE.md, feature branches live under .worktrees/. This maps cleanly onto the
wave plan: each PR-sized unit of work gets its own branch + worktree, so multiple tracks
can run concurrently without stashing or branch-switching. Worktrees share the underlying .git metadata, so all worktrees see the same commit graph — merging a wave-1 branch
makes it immediately available as a base for wave-2 branches.
Each agent/developer then works from their worktree's root directory. When a branch
merges, run git worktree remove .worktrees/<branch> to clean up. .worktrees/ is
expected to be gitignored (per CLAUDE.md); add it if missing.
Spawning wave 2/3/4/5 worktrees
Each subsequent wave branches off masterafter its prerequisite has landed there.
Run git fetch origin && git worktree add -b <new-branch> .worktrees/<new-branch> origin/master
at the point the wave is ready to start.
Branching wave-2 work off an unmerged wave-1 branch is possible but risky: if review
feedback amends the wave-1 branch, the wave-2 branch needs rebasing. Prefer waiting for
the merge.
Coordination rules
Only one branch touches llm_client.py at a time. Steps 1 and 2 both modify it
serially; Step 6 also imports from it but doesn't modify it. Any other branch that
needs to touch it must rebase onto the current latest.
Only one branch touches pyproject.toml at a time. Only Step 7 modifies it. Other
branches must not add or change dependencies in their PRs; if a port genuinely needs
a new dep, that's a separate PR first.
Agent sessions: when running concurrent Claude Code sessions, give each session
a working directory inside its own worktree. Sessions sharing .git metadata is safe;
sessions sharing a working tree is not.
Keep anthropic declared in pyproject.toml throughout. Step 7 is the only PR that removes it, and only after grep confirms zero import anthropic and the smoke test passes.
Review bandwidth is the real bottleneck. If PRs stack up waiting for review, the parallelism buys nothing. Confirm reviewer availability before opening Wave 1.
Goal
Finish the Claude Agent SDK migration that was partially undone by merging upstream PR #23 (which expanded
anthropicusage for rate-limit handling). End state: zeroimport anthropicanywhere inlibs/openant-core,anthropicremoved frompyproject.toml.See
SDK_MIGRATION_COMPLETION_PLAN.mdfor the full plan (also duplicated in the collapsed section below).Status
AssistantMessage.error=="rate_limit"under load)sdk_errorstaxonomy_run_querycontext_enhancerclassify_errorreport/generator.pyfinding_verifier.pyagentic_enhancer/agent.py(absorbed 5b)shared_clientincontext_enhanceranthropicfrompyproject.toml(+ final cleanup)knostic/OpenAntSteps 1–7: code complete. PR #38 is the final dep drop; once merged, the fork has zero
anthropicusage. Step 7 also caught one straggler site inopenant/cli.py(cmd_report_data's remediation-guidance LLM call) that the earlier ports did not touch — the dep-drift regression test from PR #30 surfaced it when the dep list shrank.Why this exists (short version)
knostic/OpenAntPR feat: auto-detect dependency changes and reinstall openant #23 (2026-04-14) addedGlobalRateLimiterand parallel execution, which re-introducedanthropic.*Errorhandling on upstream's version of the four files.anthropicinpyproject.tomlas a minimal bugfix, plus addedtests/test_declared_dependencies.pyto prevent the same drift in the future.This issue tracks finishing what PR #30 deferred.
Full plan from SDK_MIGRATION_COMPLETION_PLAN.md
Plan: Complete the Claude Agent SDK migration
Status update (why this plan exists now)
Fork PR #25 (2026-03-23) completed the SDK migration against the fork's then-current
code. Zero
anthropicreferences remained in any of the four files immediately afterthat commit.
Upstream then independently merged their own PR #23 (2026-04-14) which, among other
things, added
GlobalRateLimiter+ parallel execution and expandedanthropicusage with typed rate-limit handling. When the fork merged upstream/master (fork PR #29,
2026-04-16), the merge absorbed upstream's anthropic-based rate-limit code without
re-porting to the SDK. That's the state we need to clean up now.
Feasibility answer (resolves the blocking decision from the previous draft)
The SDK does surface rate limits — via
AssistantMessage.error: AssistantMessageError | Nonewhere
AssistantMessageErroris a typed literal that includes"rate_limit"(see
claude_agent_sdk/types.py:767-774). The other values are"authentication_failed","billing_error","invalid_request","server_error","unknown".Trade-off vs. the old
anthropic.RateLimitErrorpath:AssistantMessage.error == "rate_limit"retry-afterheaderrequest-idAssistantMessageErrorvaluesThe
retry-afterloss is tolerable:GlobalRateLimiter's default backoff is already30s (configurable), and the header was only ever an upper-bound hint. Everything else
is either cleanly equivalent or finer-grained in the SDK (the SDK distinguishes
"billing_error"and"authentication_failed"whichanthropicbundles intoAPIStatusError).Conclusion: full migration is feasible. No stderr parsing required. Earlier drafts
of this plan hedged on this; that was wrong.
Success criteria
import anthropicanywhere underlibs/openant-core/.anthropicremoved frompyproject.tomldependencies.tests/test_declared_dependencies.py(added in fork PR fix: declare anthropic and tree-sitter-zig, guard against dep drift #30) still passes.GlobalRateLimiter.report_rate_limit()and all workers back off.TokenTracker.total_cost_usd) still accurate per stage.Call-site inventory (current fork state, post-merge)
utilities/finding_verifier.pyclient: "anthropic.Anthropic | None" = None— constructor param.self.client = client or anthropic.Anthropic(max_retries=5)— primary client, live.response = self.client.messages.create(model=VERIFIER_MODEL, tools=VERIFICATION_TOOLS, messages=...)— main Stage 2 verification loop, using manual tool dispatch (VERIFICATION_TOOLS+ToolExecutor).except anthropic.RateLimitError as exc: get_rate_limiter().report_rate_limit(retry_after); raise.utilities/agentic_enhancer/agent.pyclient: Optional[anthropic.Anthropic] = None— constructor param, marked as "Shared Anthropic client (reuse across workers to avoid FD exhaustion)".self.client = client or anthropic.Anthropic(max_retries=5).response = self.client.messages.create(model=AGENT_MODEL, tools=TOOL_DEFINITIONS, messages=...)— main enhancement loop, manual tool dispatch.except anthropic.RateLimitError— reports to rate limiter, attachesagent_stateto exception, re-raises.client: Optional[anthropic.Anthropic] = Noneparam.utilities/context_enhancer.pyimport anthropic.classify_error(exc)— returns a dict withtype∈ {connection,timeout,rate_limit,api_status} byisinstancechecks, plusstatus_code,request_id,retry_afterextracted fromexc.response.headers. Used for diagnostic logging in error reports.shared_client = anthropic.Anthropic(max_retries=5)passed to parallelContextAgentworkers.report/generator.pyimport anthropic.client = anthropic.Anthropic()thenclient.messages.create(model=MODEL, ...)ingenerate_summary_report.generate_disclosure.Migration blueprint (per call site)
The good news:
utilities/llm_client.pyalready has every SDK primitive these sitesneed. We're not designing new abstractions — we're routing existing call sites through
them.
llm_client.pyclient.messages.create(model=M, tools=VERIFICATION_TOOLS, ...)run_native_verification(prompt, system, model, repo_path, json_schema)— multi-turn with SDK native tools (Read/Grep/Glob/Bash). Replaces the manual tool loop entirely.client.messages.create(model=M, tools=TOOL_DEFINITIONS, ...)(agent.py)_run_query_sync(prompt, options)with_build_options(model=M, allowed_tools=["Read","Grep","Glob","Bash"], add_dirs=[repo_path]).client.messages.create(model=M, system=S, messages=[...])(single-turn, no tools)create_message(prompt, model=M, system=S)orAnthropicClient(model=M).analyze_sync(prompt)for tracked calls.Step 1 — Error taxonomy
Create
utilities/sdk_errors.py:OpenAntLLMError(Exception)— base.RateLimitError(OpenAntLLMError)— forAssistantMessage.error == "rate_limit".BillingError(OpenAntLLMError),AuthError(OpenAntLLMError),APIStatusError(OpenAntLLMError)— one perAssistantMessageErrorvalue.classify_error(exc: OpenAntLLMError) -> dict— returns the shapecontext_enhancer.pycurrently builds (type,exception_class,message,status_code) so diagnostic logging keeps working. Dropsrequest_idandretry_after(SDK doesn't surface them).Step 2 — Wire SDK error surfacing
In
utilities/llm_client.py:_run_querymessage loop (around line 128), when receiving anAssistantMessage, checkmessage.error.error == "rate_limit": callget_rate_limiter().report_rate_limit(0)(noretry-aftersignal — let default backoff apply) and raisesdk_errors.RateLimitError.errorvalues raise the correspondingsdk_errors.*class.This centralises rate-limit detection in one place — individual callers no longer need
their own
except anthropic.RateLimitErrorblocks.Step 3 — Port
finding_verifier.pyself.client,clientconstructor param, and the wholewhile iterations < MAX_ITERATIONS: self.client.messages.create(tools=VERIFICATION_TOOLS, ...)loop.run_native_verification(prompt, system_prompt, VERIFIER_MODEL, repo_path, json_schema=VERIFICATION_SCHEMA)call.except anthropic.RateLimitErrorat 343 and 862 — rate-limit handling is now centralised in_run_query(step 2). If the caller still wants to re-raise with state attached, catchsdk_errors.RateLimitErrorinstead.73a01a0diff already did this work. It's deleted from current code but recoverable fromgit show 73a01a0 -- libs/openant-core/utilities/finding_verifier.py.Step 4 — Port
agentic_enhancer/agent.pyself.clientand its constructor param, replace thewhile iterations: self.client.messages.create(tools=TOOL_DEFINITIONS, ...)loop with_run_query_sync(prompt, options)whereoptions = _build_options(model=AGENT_MODEL, allowed_tools=["Read","Grep","Glob","Bash"], add_dirs=[repo_path]).raiseat line 203 to catchsdk_errors.RateLimitErrorand attachagent_statebefore re-raising (same pattern, different exception class).73a01a0diff. Must account for upstream's added entry-point filtering (entry_points,reachabilityparams) that didn't exist when feat: migrate all LLM calls to Claude Agent SDK #25 was written — preserve those.Step 5 — Port
context_enhancer.pyclassify_error(lines 65-77): swap theisinstance(exc, anthropic.*)chain for checks againstsdk_errors.*classes. Keep the returned dict shape so callers (diagnostic logging) don't change.shared_clientat line 573: delete. With_run_query_syncthere's no shared client to pre-construct — each call spins up a freshClaudeSDKClientcontext manager. If there's a real FD-exhaustion concern under high concurrency, it needs to be re-proven post-migration (the fear behind the comment came from theanthropicSDK's connection pool).Step 6 — Port
report/generator.pyclient = anthropic.Anthropic(); response = client.messages.create(...)at lines 138 and 161 withtext = create_message(prompt, model=MODEL, system=system_prompt)._extract_usage(response)) currently relies onresponse.usageshape. Need to return the SDKResultMessage'susagedict instead —create_messagetoday returns onlytext; it needs a sibling that returns(text, usage)or the report generator needs to useAnthropicClientwhich already tracks usage.Step 7 — Drop
anthropicfrompyproject.tomlOnce steps 3-6 land, grep confirms no
import anthropicremains, and the smoke testpasses (clean venv install →
import openant, core, utilities, parsers, prompts, context, report).Then remove
"anthropic>=0.40.0"fromdependencies.Step 8 — End-to-end verification (do not skip)
The three manual smoke tests from PR #25's original test plan (which went unchecked):
openant enhance --fresh --workers 2against a small test repo with a real API key.openant verifyon a real finding with a real API key.openant analyzewith a real API key.openant report --format htmlagainst a completed scan.openant generate-context <repo>against a real repo.OPENANT_LOCAL_CLAUDE=true(local session auth path).--workers 20briefly and confirmGlobalRateLimiter.report_rate_limitfires, all workers pause, execution resumes.Step 9 — Submit upstream
GlobalRateLimiterwork — that's already upstream; we're just porting its integration.tests/test_declared_dependencies.py) — this is the mechanism that prevents a future upstream merge from silently re-introducinganthropica second time.Open questions to resolve while implementing
VERIFICATION_TOOLS/TOOL_DEFINITIONSwere custom tool schemas foranthropic's tool-use API. SDK native tools (Read/Grep/Glob/Bash) cover the same surface but with different ergonomics. Re-read PR feat: migrate all LLM calls to Claude Agent SDK #25's prompt updates inprompts/verification_prompts.pyandutilities/agentic_enhancer/prompts.py— they coach the model on the native tools. Those prompt changes must come along with the code changes or the model will fumble tool calls.VERIFICATION_SCHEMAJSON schema for structured output: PR feat: migrate all LLM calls to Claude Agent SDK #25 added one inprompts/verification_prompts.py. Check that it still exists in current code, or re-add it.restore_fromonTokenTracker: PR feat: migrate all LLM calls to Claude Agent SDK #25 added this for checkpoint resume. Still there (llm_client.py:259). Good — no regression.ContextAgent.__init__that PR feat: migrate all LLM calls to Claude Agent SDK #25 didn't know about. Preserve them when rewriting.Risks
geospatial_vuln12,flowise_vuln4,object_browser).retry-after:GlobalRateLimiterwill use default backoff instead of API-provided. In practice the API's values are usually ≤30s so default backoff is already conservative. If this becomes a problem, the SDK would need to surface the header (file an issue upstream atclaude_agent_sdk).ContextEnhancer: theshared_clientcomment implies this was observed under parallel load withanthropic. The SDK's per-callClaudeSDKClientcontext manager is a different mechanism (subprocess spawn, not HTTP connection pool). Needs load-testing — not a reason to keepanthropic, but a reason to test at high--workers.AssistantMessage.errortiming: untested assumption that the SDK delivers anAssistantMessagewitherror="rate_limit"before theResultMessage. If it only appears insideResultMessage.resultas an error string, the detection point moves. Needs a live test against a real rate limit.Rough sizing (per step)
_run_query): 0.5 day.finding_verifier.py): 1 day — largest file, mostly re-apply PR feat: migrate all LLM calls to Claude Agent SDK #25's deleted diff with minor upstream-delta adjustments.agent.py): 1 day — same, plus preserving upstream's new entry-point/reachability params.context_enhancer.py): 0.5 day (5a:classify_errorrewrite; 5b: deleteshared_client— only after step 4).report/generator.py): 0.5 day.pyproject.toml).Serial total: 5-6 days. Parallel critical path: ~3.5 days (see below).
Dependency graph
Critical path:
Step 1 → Step 2 → (Step 3 ∥ Step 4) → Step 7 → Step 8 → Step 9.Parallelisation waves
Wave 1 — day 1, no blockers, run concurrently:
AssistantMessage.error == "rate_limit"actually fires under load. Informs Step 2's implementation.git show 73a01a0 -- libs/openant-core/prompts/verification_prompts.py libs/openant-core/utilities/agentic_enhancer/prompts.py. Stage on a prep branch. Needed by Steps 3 and 4.utilities/sdk_errors.py. Pure new module, touches nothing existing.report/generator.py. Usescreate_messagewhich already exists; doesn't need the new taxonomy (single-turn, no rate-limit loop). Can land before Step 1.Wave 2 — after Step 1 merges:
AssistantMessage.errordetection into_run_query. Blocks Steps 3 and 4.classify_erroragainst the new taxonomy. Independent of Step 2. Runs in parallel.Wave 3 — after Steps 1 + 2 merge:
finding_verifier.py.agentic_enhancer/agent.py.Wave 4 — after Step 4 merges:
shared_clientincontext_enhancer.py. RequiresContextAgentalready offself.client.Wave 5 — after Steps 3, 4, 5, 6 all in:
anthropicfrompyproject.toml.Wave 6:
Team allocation
llm_client.pyandpyproject.tomleat the savings.Implementation via git worktrees
Per
CLAUDE.md, feature branches live under.worktrees/. This maps cleanly onto thewave plan: each PR-sized unit of work gets its own branch + worktree, so multiple tracks
can run concurrently without stashing or branch-switching. Worktrees share the underlying
.gitmetadata, so all worktrees see the same commit graph — merging a wave-1 branchmakes it immediately available as a base for wave-2 branches.
Branch ↔ worktree mapping
chore/sdk-ratelimit-spike.worktrees/chore/sdk-ratelimit-spikemasterchore/recover-pr25-prompts.worktrees/chore/recover-pr25-promptsmasterfeat/sdk-errors-taxonomy.worktrees/feat/sdk-errors-taxonomymasterrefactor/report-generator-sdk.worktrees/refactor/report-generator-sdkmasterfeat/sdk-error-surfacing.worktrees/feat/sdk-error-surfacingmaster(after Step 1 lands)refactor/classify-error-sdk.worktrees/refactor/classify-error-sdkmaster(after Step 1 lands)refactor/verifier-sdk-native.worktrees/refactor/verifier-sdk-nativemaster(after Step 2 lands)refactor/enhancer-agent-sdk.worktrees/refactor/enhancer-agent-sdkmaster(after Step 2 lands)refactor/drop-shared-client.worktrees/refactor/drop-shared-clientmaster(after Step 4 lands)chore/drop-anthropic-dep.worktrees/chore/drop-anthropic-depmaster(after Steps 3-6 land)Bootstrap (wave 1)
From the repo root (no
cd, nogit -C):Each agent/developer then works from their worktree's root directory. When a branch
merges, run
git worktree remove .worktrees/<branch>to clean up..worktrees/isexpected to be gitignored (per
CLAUDE.md); add it if missing.Spawning wave 2/3/4/5 worktrees
Each subsequent wave branches off
masterafter its prerequisite has landed there.Run
git fetch origin && git worktree add -b <new-branch> .worktrees/<new-branch> origin/masterat the point the wave is ready to start.
Branching wave-2 work off an unmerged wave-1 branch is possible but risky: if review
feedback amends the wave-1 branch, the wave-2 branch needs rebasing. Prefer waiting for
the merge.
Coordination rules
llm_client.pyat a time. Steps 1 and 2 both modify itserially; Step 6 also imports from it but doesn't modify it. Any other branch that
needs to touch it must rebase onto the current latest.
pyproject.tomlat a time. Only Step 7 modifies it. Otherbranches must not add or change dependencies in their PRs; if a port genuinely needs
a new dep, that's a separate PR first.
tests/test_declared_dependencies.py(from fork PR fix: declare anthropic and tree-sitter-zig, guard against dep drift #30) should already be onmaster before any port starts. It will fail CI on any branch that introduces an
undeclared import.
a working directory inside its own worktree. Sessions sharing
.gitmetadata is safe;sessions sharing a working tree is not.
Sequencing rules
anthropicdeclared inpyproject.tomlthroughout. Step 7 is the only PR that removes it, and only after grep confirms zeroimport anthropicand the smoke test passes.