Skip to content

Complete the Claude Agent SDK migration #35

@joshbouncesecurity

Description

@joshbouncesecurity

Goal

Finish the Claude Agent SDK migration that was partially undone by merging upstream PR #23 (which expanded anthropic usage for rate-limit handling). End state: zero import anthropic anywhere in libs/openant-core, anthropic removed from pyproject.toml.

See SDK_MIGRATION_COMPLETION_PLAN.md for the full plan (also duplicated in the collapsed section below).

Status

Step Description PR Status
0a Live rate-limit spike (confirm AssistantMessage.error=="rate_limit" under load) Deferred, user-run (~30 min)
0b Recover PR #25 prompt deltas No-op — prompts already on master
1 sdk_errors taxonomy #31 Merged
2 Wire SDK error surfacing into _run_query #33 Merged
5a Port context_enhancer classify_error #34 Merged
6 Port report/generator.py #32 Merged
3 Port finding_verifier.py #37 Merged
4 Port agentic_enhancer/agent.py (absorbed 5b) #36 Merged
5b Delete shared_client in context_enhancer #36 Absorbed into PR #36
7 Drop anthropic from pyproject.toml (+ final cleanup) #38 Open
8 End-to-end verification (real API, ~$30-50) Deferred, user-run
9 Open upstream PR to knostic/OpenAnt After Step 8

Steps 1–7: code complete. PR #38 is the final dep drop; once merged, the fork has zero anthropic usage. Step 7 also caught one straggler site in openant/cli.py (cmd_report_data's remediation-guidance LLM call) that the earlier ports did not touch — the dep-drift regression test from PR #30 surfaced it when the dep list shrank.

Why this exists (short version)

  1. Fork PR feat: migrate all LLM calls to Claude Agent SDK #25 completed the SDK migration on 2026-03-23.
  2. Upstream knostic/OpenAnt PR feat: auto-detect dependency changes and reinstall openant #23 (2026-04-14) added GlobalRateLimiter and parallel execution, which re-introduced anthropic.*Error handling on upstream's version of the four files.
  3. Fork PR merge: incorporate upstream parallelization, Zig parser, and report overhaul #29 (2026-04-16) merged upstream/master and silently absorbed that regression.
  4. Fork PR fix: declare anthropic and tree-sitter-zig, guard against dep drift #30 re-declared anthropic in pyproject.toml as a minimal bugfix, plus added tests/test_declared_dependencies.py to prevent the same drift in the future.

This issue tracks finishing what PR #30 deferred.

Full plan from SDK_MIGRATION_COMPLETION_PLAN.md

Plan: Complete the Claude Agent SDK migration

Status update (why this plan exists now)

Fork PR #25 (2026-03-23) completed the SDK migration against the fork's then-current
code. Zero anthropic references remained in any of the four files immediately after
that commit.

Upstream then independently merged their own PR #23 (2026-04-14) which, among other
things, added GlobalRateLimiter + parallel execution and expanded anthropic
usage with typed rate-limit handling. When the fork merged upstream/master (fork PR #29,
2026-04-16), the merge absorbed upstream's anthropic-based rate-limit code without
re-porting to the SDK. That's the state we need to clean up now.

Feasibility answer (resolves the blocking decision from the previous draft)

The SDK does surface rate limits — via AssistantMessage.error: AssistantMessageError | None
where AssistantMessageError is a typed literal that includes "rate_limit"
(see claude_agent_sdk/types.py:767-774). The other values are "authentication_failed",
"billing_error", "invalid_request", "server_error", "unknown".

Trade-off vs. the old anthropic.RateLimitError path:

anthropic.RateLimitError SDK AssistantMessage.error == "rate_limit"
Detection Typed exception Inspect message field
retry-after header Yes, exact value No — not surfaced
request-id Yes No
Other API errors Separate exception types Other AssistantMessageError values

The retry-after loss is tolerable: GlobalRateLimiter's default backoff is already
30s (configurable), and the header was only ever an upper-bound hint. Everything else
is either cleanly equivalent or finer-grained in the SDK (the SDK distinguishes
"billing_error" and "authentication_failed" which anthropic bundles into
APIStatusError).

Conclusion: full migration is feasible. No stderr parsing required. Earlier drafts
of this plan hedged on this; that was wrong.

Success criteria

  1. No import anthropic anywhere under libs/openant-core/.
  2. anthropic removed from pyproject.toml dependencies.
  3. tests/test_declared_dependencies.py (added in fork PR fix: declare anthropic and tree-sitter-zig, guard against dep drift #30) still passes.
  4. All LLM codepaths work end-to-end in a clean venv: scan, enhance, analyze, verify, generate-context, report.
  5. Rate-limit behaviour: a synthetic 429 triggers GlobalRateLimiter.report_rate_limit() and all workers back off.
  6. Cost tracking (TokenTracker.total_cost_usd) still accurate per stage.

Call-site inventory (current fork state, post-merge)

utilities/finding_verifier.py

  • Line 267: client: "anthropic.Anthropic | None" = None — constructor param.
  • Line 274: self.client = client or anthropic.Anthropic(max_retries=5) — primary client, live.
  • Line 336: response = self.client.messages.create(model=VERIFIER_MODEL, tools=VERIFICATION_TOOLS, messages=...)main Stage 2 verification loop, using manual tool dispatch (VERIFICATION_TOOLS + ToolExecutor).
  • Line 343: except anthropic.RateLimitError as exc: get_rate_limiter().report_rate_limit(retry_after); raise.
  • Line 862: second rate-limit handler in another method.

utilities/agentic_enhancer/agent.py

  • Line 109: client: Optional[anthropic.Anthropic] = None — constructor param, marked as "Shared Anthropic client (reuse across workers to avoid FD exhaustion)".
  • Line 129: self.client = client or anthropic.Anthropic(max_retries=5).
  • Line 196: response = self.client.messages.create(model=AGENT_MODEL, tools=TOOL_DEFINITIONS, messages=...)main enhancement loop, manual tool dispatch.
  • Line 203: except anthropic.RateLimitError — reports to rate limiter, attaches agent_state to exception, re-raises.
  • Line 382: another function with client: Optional[anthropic.Anthropic] = None param.

utilities/context_enhancer.py

  • Line 26: import anthropic.
  • Lines 65-77: exception classifier classify_error(exc) — returns a dict with type ∈ {connection, timeout, rate_limit, api_status} by isinstance checks, plus status_code, request_id, retry_after extracted from exc.response.headers. Used for diagnostic logging in error reports.
  • Line 573: shared_client = anthropic.Anthropic(max_retries=5) passed to parallel ContextAgent workers.

report/generator.py

  • Line 10: import anthropic.
  • Line 138: client = anthropic.Anthropic() then client.messages.create(model=MODEL, ...) in generate_summary_report.
  • Line 161: same pattern in generate_disclosure.

Migration blueprint (per call site)

The good news: utilities/llm_client.py already has every SDK primitive these sites
need. We're not designing new abstractions — we're routing existing call sites through
them.

Current call SDK replacement in llm_client.py
client.messages.create(model=M, tools=VERIFICATION_TOOLS, ...) run_native_verification(prompt, system, model, repo_path, json_schema) — multi-turn with SDK native tools (Read/Grep/Glob/Bash). Replaces the manual tool loop entirely.
client.messages.create(model=M, tools=TOOL_DEFINITIONS, ...) (agent.py) _run_query_sync(prompt, options) with _build_options(model=M, allowed_tools=["Read","Grep","Glob","Bash"], add_dirs=[repo_path]).
client.messages.create(model=M, system=S, messages=[...]) (single-turn, no tools) create_message(prompt, model=M, system=S) or AnthropicClient(model=M).analyze_sync(prompt) for tracked calls.

Step 1 — Error taxonomy

Create utilities/sdk_errors.py:

  • OpenAntLLMError(Exception) — base.
  • RateLimitError(OpenAntLLMError) — for AssistantMessage.error == "rate_limit".
  • BillingError(OpenAntLLMError), AuthError(OpenAntLLMError), APIStatusError(OpenAntLLMError) — one per AssistantMessageError value.
  • classify_error(exc: OpenAntLLMError) -> dict — returns the shape context_enhancer.py currently builds (type, exception_class, message, status_code) so diagnostic logging keeps working. Drops request_id and retry_after (SDK doesn't surface them).

Step 2 — Wire SDK error surfacing

In utilities/llm_client.py:

  • Inside _run_query message loop (around line 128), when receiving an AssistantMessage, check message.error.
  • If error == "rate_limit": call get_rate_limiter().report_rate_limit(0) (no retry-after signal — let default backoff apply) and raise sdk_errors.RateLimitError.
  • Other error values raise the corresponding sdk_errors.* class.

This centralises rate-limit detection in one place — individual callers no longer need
their own except anthropic.RateLimitError blocks.

Step 3 — Port finding_verifier.py

  • Delete self.client, client constructor param, and the whole while iterations < MAX_ITERATIONS: self.client.messages.create(tools=VERIFICATION_TOOLS, ...) loop.
  • Replace with a single run_native_verification(prompt, system_prompt, VERIFIER_MODEL, repo_path, json_schema=VERIFICATION_SCHEMA) call.
  • Remove except anthropic.RateLimitError at 343 and 862 — rate-limit handling is now centralised in _run_query (step 2). If the caller still wants to re-raise with state attached, catch sdk_errors.RateLimitError instead.
  • Reference: PR feat: migrate all LLM calls to Claude Agent SDK #25's 73a01a0 diff already did this work. It's deleted from current code but recoverable from git show 73a01a0 -- libs/openant-core/utilities/finding_verifier.py.

Step 4 — Port agentic_enhancer/agent.py

  • Same shape as step 3: delete self.client and its constructor param, replace the while iterations: self.client.messages.create(tools=TOOL_DEFINITIONS, ...) loop with _run_query_sync(prompt, options) where options = _build_options(model=AGENT_MODEL, allowed_tools=["Read","Grep","Glob","Bash"], add_dirs=[repo_path]).
  • Update the raise at line 203 to catch sdk_errors.RateLimitError and attach agent_state before re-raising (same pattern, different exception class).
  • Reference: PR feat: migrate all LLM calls to Claude Agent SDK #25's 73a01a0 diff. Must account for upstream's added entry-point filtering (entry_points, reachability params) that didn't exist when feat: migrate all LLM calls to Claude Agent SDK #25 was written — preserve those.

Step 5 — Port context_enhancer.py

  • classify_error (lines 65-77): swap the isinstance(exc, anthropic.*) chain for checks against sdk_errors.* classes. Keep the returned dict shape so callers (diagnostic logging) don't change.
  • shared_client at line 573: delete. With _run_query_sync there's no shared client to pre-construct — each call spins up a fresh ClaudeSDKClient context manager. If there's a real FD-exhaustion concern under high concurrency, it needs to be re-proven post-migration (the fear behind the comment came from the anthropic SDK's connection pool).

Step 6 — Port report/generator.py

  • Trivial. Replace client = anthropic.Anthropic(); response = client.messages.create(...) at lines 138 and 161 with text = create_message(prompt, model=MODEL, system=system_prompt).
  • Usage dict extraction (_extract_usage(response)) currently relies on response.usage shape. Need to return the SDK ResultMessage's usage dict instead — create_message today returns only text; it needs a sibling that returns (text, usage) or the report generator needs to use AnthropicClient which already tracks usage.

Step 7 — Drop anthropic from pyproject.toml

Once steps 3-6 land, grep confirms no import anthropic remains, and the smoke test
passes (clean venv install → import openant, core, utilities, parsers, prompts, context, report).
Then remove "anthropic>=0.40.0" from dependencies.

Step 8 — End-to-end verification (do not skip)

The three manual smoke tests from PR #25's original test plan (which went unchecked):

  • openant enhance --fresh --workers 2 against a small test repo with a real API key.
  • openant verify on a real finding with a real API key.
  • openant analyze with a real API key.
  • openant report --format html against a completed scan.
  • openant generate-context <repo> against a real repo.
  • All of the above with OPENANT_LOCAL_CLAUDE=true (local session auth path).
  • Force a rate limit: run enhance at --workers 20 briefly and confirm GlobalRateLimiter.report_rate_limit fires, all workers pause, execution resumes.

Step 9 — Submit upstream

  • Open an upstream PR with a conventional-commits subject. Do not bundle with the
    GlobalRateLimiter work — that's already upstream; we're just porting its integration.
  • Include the regression-guard test from fork PR fix: declare anthropic and tree-sitter-zig, guard against dep drift #30 (tests/test_declared_dependencies.py) — this is the mechanism that prevents a future upstream merge from silently re-introducing anthropic a second time.

Open questions to resolve while implementing

  1. VERIFICATION_TOOLS / TOOL_DEFINITIONS were custom tool schemas for anthropic's tool-use API. SDK native tools (Read/Grep/Glob/Bash) cover the same surface but with different ergonomics. Re-read PR feat: migrate all LLM calls to Claude Agent SDK #25's prompt updates in prompts/verification_prompts.py and utilities/agentic_enhancer/prompts.py — they coach the model on the native tools. Those prompt changes must come along with the code changes or the model will fumble tool calls.
  2. VERIFICATION_SCHEMA JSON schema for structured output: PR feat: migrate all LLM calls to Claude Agent SDK #25 added one in prompts/verification_prompts.py. Check that it still exists in current code, or re-add it.
  3. restore_from on TokenTracker: PR feat: migrate all LLM calls to Claude Agent SDK #25 added this for checkpoint resume. Still there (llm_client.py:259). Good — no regression.
  4. Entry-point filtering + reachability (added upstream in PR feat: auto-detect dependency changes and reinstall openant #23): these are new params on ContextAgent.__init__ that PR feat: migrate all LLM calls to Claude Agent SDK #25 didn't know about. Preserve them when rewriting.

Risks

  • Prompt drift: the model's tool-calling behaviour changes between the custom tool schema and SDK native tools. Expected: PR feat: migrate all LLM calls to Claude Agent SDK #25's prompt-update patterns work. Unexpected: native tools produce different verdicts on some units. Mitigation: run a before/after comparison on a reference dataset (the fork has geospatial_vuln12, flowise_vuln4, object_browser).
  • Lost retry-after: GlobalRateLimiter will use default backoff instead of API-provided. In practice the API's values are usually ≤30s so default backoff is already conservative. If this becomes a problem, the SDK would need to surface the header (file an issue upstream at claude_agent_sdk).
  • FD exhaustion concern in ContextEnhancer: the shared_client comment implies this was observed under parallel load with anthropic. The SDK's per-call ClaudeSDKClient context manager is a different mechanism (subprocess spawn, not HTTP connection pool). Needs load-testing — not a reason to keep anthropic, but a reason to test at high --workers.
  • AssistantMessage.error timing: untested assumption that the SDK delivers an AssistantMessage with error="rate_limit" before the ResultMessage. If it only appears inside ResultMessage.result as an error string, the detection point moves. Needs a live test against a real rate limit.

Rough sizing (per step)

  • Step 0a (live rate-limit spike): 30 min.
  • Step 0b (recover PR feat: migrate all LLM calls to Claude Agent SDK #25 prompt deltas): 30 min.
  • Step 1 (taxonomy): 0.5 day.
  • Step 2 (SDK error surfacing in _run_query): 0.5 day.
  • Step 3 (finding_verifier.py): 1 day — largest file, mostly re-apply PR feat: migrate all LLM calls to Claude Agent SDK #25's deleted diff with minor upstream-delta adjustments.
  • Step 4 (agent.py): 1 day — same, plus preserving upstream's new entry-point/reachability params.
  • Step 5 (context_enhancer.py): 0.5 day (5a: classify_error rewrite; 5b: delete shared_client — only after step 4).
  • Step 6 (report/generator.py): 0.5 day.
  • Step 7: trivial (one line in pyproject.toml).
  • Step 8 (E2E with real API): 1 day, ~$30-50 API spend.
  • Step 9 (upstream PR): 0.5 day.

Serial total: 5-6 days. Parallel critical path: ~3.5 days (see below).

Dependency graph

Step 0a: Live rate-limit test  ──┐
Step 0b: Recover PR #25 prompts ─┤   (prep work, zero deps)
                                 │
Step 1: sdk_errors taxonomy ─────┼──→ Step 2: wire into _run_query ──┐
                                 │                                   │
Step 6: report/generator.py ─────┘   (only needs create_message,     ├──→ Step 7: drop dep ──→ Step 8: E2E ──→ Step 9: upstream PR
                                      already exists)                │
                                                                     │
                Step 3: finding_verifier.py ────────────────────────┤
                Step 4: agentic_enhancer/agent.py ──────────────────┤
                Step 5a: context_enhancer classify_error ───────────┤
                Step 5b: delete shared_client ←── depends on Step 4 ┘

Critical path: Step 1 → Step 2 → (Step 3 ∥ Step 4) → Step 7 → Step 8 → Step 9.

Parallelisation waves

Wave 1 — day 1, no blockers, run concurrently:

  • Step 0a — 30-min spike. Confirm AssistantMessage.error == "rate_limit" actually fires under load. Informs Step 2's implementation.
  • Step 0b — recover PR feat: migrate all LLM calls to Claude Agent SDK #25's prompt updates via git show 73a01a0 -- libs/openant-core/prompts/verification_prompts.py libs/openant-core/utilities/agentic_enhancer/prompts.py. Stage on a prep branch. Needed by Steps 3 and 4.
  • Step 1 — write utilities/sdk_errors.py. Pure new module, touches nothing existing.
  • Step 6 — port report/generator.py. Uses create_message which already exists; doesn't need the new taxonomy (single-turn, no rate-limit loop). Can land before Step 1.

Wave 2 — after Step 1 merges:

  • Step 2 — wire AssistantMessage.error detection into _run_query. Blocks Steps 3 and 4.
  • Step 5a — rewrite classify_error against the new taxonomy. Independent of Step 2. Runs in parallel.

Wave 3 — after Steps 1 + 2 merge:

  • Step 3 — port finding_verifier.py.
  • Step 4 — port agentic_enhancer/agent.py.
  • Touch disjoint files, no cross-dependency, merge in either order.

Wave 4 — after Step 4 merges:

  • Step 5b — delete shared_client in context_enhancer.py. Requires ContextAgent already off self.client.

Wave 5 — after Steps 3, 4, 5, 6 all in:

  • Step 7 — remove anthropic from pyproject.toml.
  • Step 8 — E2E with real API. ~1 day, ~$50.

Wave 6:

  • Step 9 — upstream PR.

Team allocation

  • Solo: 5-6 days fully sequential.
  • Two workers: ~4 days. Worker A: critical path (Steps 0a/1/2/3/7/8/9). Worker B: parallel track (Steps 0b/6/4/5), merges asynchronously.
  • Three workers: ~3.5 days. Worker C takes Step 5a + the reference-dataset before/after comparison (prompt-drift risk mitigation) + reviews.
  • Four+ workers: diminishing returns — merge conflicts on llm_client.py and pyproject.toml eat the savings.

Implementation via git worktrees

Per CLAUDE.md, feature branches live under .worktrees/. This maps cleanly onto the
wave plan: each PR-sized unit of work gets its own branch + worktree, so multiple tracks
can run concurrently without stashing or branch-switching. Worktrees share the underlying
.git metadata, so all worktrees see the same commit graph — merging a wave-1 branch
makes it immediately available as a base for wave-2 branches.

Branch ↔ worktree mapping

Wave Step Branch Worktree path Base
1 0a chore/sdk-ratelimit-spike .worktrees/chore/sdk-ratelimit-spike master
1 0b chore/recover-pr25-prompts .worktrees/chore/recover-pr25-prompts master
1 1 feat/sdk-errors-taxonomy .worktrees/feat/sdk-errors-taxonomy master
1 6 refactor/report-generator-sdk .worktrees/refactor/report-generator-sdk master
2 2 feat/sdk-error-surfacing .worktrees/feat/sdk-error-surfacing master (after Step 1 lands)
2 5a refactor/classify-error-sdk .worktrees/refactor/classify-error-sdk master (after Step 1 lands)
3 3 refactor/verifier-sdk-native .worktrees/refactor/verifier-sdk-native master (after Step 2 lands)
3 4 refactor/enhancer-agent-sdk .worktrees/refactor/enhancer-agent-sdk master (after Step 2 lands)
4 5b refactor/drop-shared-client .worktrees/refactor/drop-shared-client master (after Step 4 lands)
5 7 chore/drop-anthropic-dep .worktrees/chore/drop-anthropic-dep master (after Steps 3-6 land)

Bootstrap (wave 1)

From the repo root (no cd, no git -C):

git fetch origin
git worktree add -b chore/sdk-ratelimit-spike    .worktrees/chore/sdk-ratelimit-spike    master
git worktree add -b chore/recover-pr25-prompts   .worktrees/chore/recover-pr25-prompts   master
git worktree add -b feat/sdk-errors-taxonomy     .worktrees/feat/sdk-errors-taxonomy     master
git worktree add -b refactor/report-generator-sdk .worktrees/refactor/report-generator-sdk master

Each agent/developer then works from their worktree's root directory. When a branch
merges, run git worktree remove .worktrees/<branch> to clean up. .worktrees/ is
expected to be gitignored (per CLAUDE.md); add it if missing.

Spawning wave 2/3/4/5 worktrees

Each subsequent wave branches off master after its prerequisite has landed there.
Run git fetch origin && git worktree add -b <new-branch> .worktrees/<new-branch> origin/master
at the point the wave is ready to start.

Branching wave-2 work off an unmerged wave-1 branch is possible but risky: if review
feedback amends the wave-1 branch, the wave-2 branch needs rebasing. Prefer waiting for
the merge.

Coordination rules

  • Only one branch touches llm_client.py at a time. Steps 1 and 2 both modify it
    serially; Step 6 also imports from it but doesn't modify it. Any other branch that
    needs to touch it must rebase onto the current latest.
  • Only one branch touches pyproject.toml at a time. Only Step 7 modifies it. Other
    branches must not add or change dependencies in their PRs; if a port genuinely needs
    a new dep, that's a separate PR first.
  • tests/test_declared_dependencies.py (from fork PR fix: declare anthropic and tree-sitter-zig, guard against dep drift #30) should already be on
    master before any port starts. It will fail CI on any branch that introduces an
    undeclared import.
  • Agent sessions: when running concurrent Claude Code sessions, give each session
    a working directory inside its own worktree. Sessions sharing .git metadata is safe;
    sessions sharing a working tree is not.

Sequencing rules

  • One PR per step. Do not bundle. The fork's history shows "one big migration PR" is how we got into this mess twice (feat: migrate all LLM calls to Claude Agent SDK #25 was one big PR that left residual issues; merge: incorporate upstream parallelization, Zig parser, and report overhaul #29's upstream merge was one big PR that undid the parts that did land).
  • Keep anthropic declared in pyproject.toml throughout. Step 7 is the only PR that removes it, and only after grep confirms zero import anthropic and the smoke test passes.
  • Review bandwidth is the real bottleneck. If PRs stack up waiting for review, the parallelism buys nothing. Confirm reviewer availability before opening Wave 1.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions