Complete the Claude Agent SDK migration

## Goal

Finish the Claude Agent SDK migration that was partially undone by merging upstream PR #23 (which expanded `anthropic` usage for rate-limit handling). End state: zero `import anthropic` anywhere in `libs/openant-core`, `anthropic` removed from `pyproject.toml`.

See [`SDK_MIGRATION_COMPLETION_PLAN.md`](../blob/master/SDK_MIGRATION_COMPLETION_PLAN.md) for the full plan (also duplicated in the collapsed section below).

## Status

| Step | Description | PR | Status |
|---|---|---|---|
| 0a | Live rate-limit spike (confirm `AssistantMessage.error=="rate_limit"` under load) | — | Deferred, user-run (~30 min) |
| 0b | Recover PR #25 prompt deltas | — | No-op — prompts already on master |
| 1 | `sdk_errors` taxonomy | [#31](https://github.com/joshbouncesecurity/OpenAnt/pull/31) | Merged |
| 2 | Wire SDK error surfacing into `_run_query` | [#33](https://github.com/joshbouncesecurity/OpenAnt/pull/33) | Merged |
| 5a | Port `context_enhancer` `classify_error` | [#34](https://github.com/joshbouncesecurity/OpenAnt/pull/34) | Merged |
| 6 | Port `report/generator.py` | [#32](https://github.com/joshbouncesecurity/OpenAnt/pull/32) | Merged |
| 3 | Port `finding_verifier.py` | [#37](https://github.com/joshbouncesecurity/OpenAnt/pull/37) | Merged |
| 4 | Port `agentic_enhancer/agent.py` (absorbed 5b) | [#36](https://github.com/joshbouncesecurity/OpenAnt/pull/36) | Merged |
| 5b | Delete `shared_client` in `context_enhancer` | [#36](https://github.com/joshbouncesecurity/OpenAnt/pull/36) | Absorbed into PR #36 |
| 7 | Drop `anthropic` from `pyproject.toml` (+ final cleanup) | [#38](https://github.com/joshbouncesecurity/OpenAnt/pull/38) | Open |
| 8 | End-to-end verification (real API, ~$30-50) | — | Deferred, user-run |
| 9 | Open upstream PR to `knostic/OpenAnt` | — | After Step 8 |

**Steps 1–7: code complete.** PR #38 is the final dep drop; once merged, the fork has zero `anthropic` usage. Step 7 also caught one straggler site in `openant/cli.py` (`cmd_report_data`'s remediation-guidance LLM call) that the earlier ports did not touch — the dep-drift regression test from PR #30 surfaced it when the dep list shrank.

## Why this exists (short version)

1. Fork PR #25 completed the SDK migration on 2026-03-23.
2. Upstream `knostic/OpenAnt` PR #23 (2026-04-14) added `GlobalRateLimiter` and parallel execution, which re-introduced `anthropic.*Error` handling on upstream's version of the four files.
3. Fork PR #29 (2026-04-16) merged upstream/master and silently absorbed that regression.
4. Fork PR #30 re-declared `anthropic` in `pyproject.toml` as a minimal bugfix, plus added `tests/test_declared_dependencies.py` to prevent the same drift in the future.

This issue tracks finishing what PR #30 deferred.

<details>
<summary>Full plan from SDK_MIGRATION_COMPLETION_PLAN.md</summary>

# Plan: Complete the Claude Agent SDK migration

## Status update (why this plan exists now)

Fork PR #25 (2026-03-23) **completed** the SDK migration against the fork's then-current
code. Zero `anthropic` references remained in any of the four files immediately after
that commit.

Upstream then independently merged their own PR #23 (2026-04-14) which, among other
things, added `GlobalRateLimiter` + parallel execution and **expanded** `anthropic`
usage with typed rate-limit handling. When the fork merged upstream/master (fork PR #29,
2026-04-16), the merge absorbed upstream's anthropic-based rate-limit code without
re-porting to the SDK. That's the state we need to clean up now.

## Feasibility answer (resolves the blocking decision from the previous draft)

The SDK **does** surface rate limits — via `AssistantMessage.error: AssistantMessageError | None`
where `AssistantMessageError` is a typed literal that includes `"rate_limit"`
(see `claude_agent_sdk/types.py:767-774`). The other values are `"authentication_failed"`,
`"billing_error"`, `"invalid_request"`, `"server_error"`, `"unknown"`.

Trade-off vs. the old `anthropic.RateLimitError` path:

| | anthropic.RateLimitError | SDK `AssistantMessage.error == "rate_limit"` |
|---|---|---|
| Detection | Typed exception | Inspect message field |
| `retry-after` header | Yes, exact value | No — not surfaced |
| `request-id` | Yes | No |
| Other API errors | Separate exception types | Other `AssistantMessageError` values |

The `retry-after` loss is tolerable: `GlobalRateLimiter`'s default backoff is already
30s (configurable), and the header was only ever an upper-bound hint. Everything else
is either cleanly equivalent or finer-grained in the SDK (the SDK distinguishes
`"billing_error"` and `"authentication_failed"` which `anthropic` bundles into
`APIStatusError`).

**Conclusion: full migration is feasible. No stderr parsing required.** Earlier drafts
of this plan hedged on this; that was wrong.

## Success criteria

1. No `import anthropic` anywhere under `libs/openant-core/`.
2. `anthropic` removed from `pyproject.toml` dependencies.
3. `tests/test_declared_dependencies.py` (added in fork PR #30) still passes.
4. All LLM codepaths work end-to-end in a clean venv: scan, enhance, analyze, verify, generate-context, report.
5. Rate-limit behaviour: a synthetic 429 triggers `GlobalRateLimiter.report_rate_limit()` and all workers back off.
6. Cost tracking (`TokenTracker.total_cost_usd`) still accurate per stage.

## Call-site inventory (current fork state, post-merge)

### `utilities/finding_verifier.py`
- Line 267: `client: "anthropic.Anthropic | None" = None` — constructor param.
- Line 274: `self.client = client or anthropic.Anthropic(max_retries=5)` — primary client, live.
- Line 336: `response = self.client.messages.create(model=VERIFIER_MODEL, tools=VERIFICATION_TOOLS, messages=...)` — **main Stage 2 verification loop**, using manual tool dispatch (`VERIFICATION_TOOLS` + `ToolExecutor`).
- Line 343: `except anthropic.RateLimitError as exc: get_rate_limiter().report_rate_limit(retry_after); raise`.
- Line 862: second rate-limit handler in another method.

### `utilities/agentic_enhancer/agent.py`
- Line 109: `client: Optional[anthropic.Anthropic] = None` — constructor param, marked as "Shared Anthropic client (reuse across workers to avoid FD exhaustion)".
- Line 129: `self.client = client or anthropic.Anthropic(max_retries=5)`.
- Line 196: `response = self.client.messages.create(model=AGENT_MODEL, tools=TOOL_DEFINITIONS, messages=...)` — **main enhancement loop**, manual tool dispatch.
- Line 203: `except anthropic.RateLimitError` — reports to rate limiter, attaches `agent_state` to exception, re-raises.
- Line 382: another function with `client: Optional[anthropic.Anthropic] = None` param.

### `utilities/context_enhancer.py`
- Line 26: `import anthropic`.
- Lines 65-77: exception classifier `classify_error(exc)` — returns a dict with `type` ∈ {`connection`, `timeout`, `rate_limit`, `api_status`} by `isinstance` checks, plus `status_code`, `request_id`, `retry_after` extracted from `exc.response.headers`. Used for diagnostic logging in error reports.
- Line 573: `shared_client = anthropic.Anthropic(max_retries=5)` passed to parallel `ContextAgent` workers.

### `report/generator.py`
- Line 10: `import anthropic`.
- Line 138: `client = anthropic.Anthropic()` then `client.messages.create(model=MODEL, ...)` in `generate_summary_report`.
- Line 161: same pattern in `generate_disclosure`.

## Migration blueprint (per call site)

The good news: `utilities/llm_client.py` already has every SDK primitive these sites
need. We're not designing new abstractions — we're routing existing call sites through
them.

| Current call | SDK replacement in `llm_client.py` |
|---|---|
| `client.messages.create(model=M, tools=VERIFICATION_TOOLS, ...)` | `run_native_verification(prompt, system, model, repo_path, json_schema)` — multi-turn with SDK native tools (Read/Grep/Glob/Bash). Replaces the manual tool loop entirely. |
| `client.messages.create(model=M, tools=TOOL_DEFINITIONS, ...)` (agent.py) | `_run_query_sync(prompt, options)` with `_build_options(model=M, allowed_tools=["Read","Grep","Glob","Bash"], add_dirs=[repo_path])`. |
| `client.messages.create(model=M, system=S, messages=[...])` (single-turn, no tools) | `create_message(prompt, model=M, system=S)` or `AnthropicClient(model=M).analyze_sync(prompt)` for tracked calls. |

### Step 1 — Error taxonomy

Create `utilities/sdk_errors.py`:
- `OpenAntLLMError(Exception)` — base.
- `RateLimitError(OpenAntLLMError)` — for `AssistantMessage.error == "rate_limit"`.
- `BillingError(OpenAntLLMError)`, `AuthError(OpenAntLLMError)`, `APIStatusError(OpenAntLLMError)` — one per `AssistantMessageError` value.
- `classify_error(exc: OpenAntLLMError) -> dict` — returns the shape `context_enhancer.py` currently builds (`type`, `exception_class`, `message`, `status_code`) so diagnostic logging keeps working. Drops `request_id` and `retry_after` (SDK doesn't surface them).

### Step 2 — Wire SDK error surfacing

In `utilities/llm_client.py`:
- Inside `_run_query` message loop (around line 128), when receiving an `AssistantMessage`, check `message.error`.
- If `error == "rate_limit"`: call `get_rate_limiter().report_rate_limit(0)` (no `retry-after` signal — let default backoff apply) and raise `sdk_errors.RateLimitError`.
- Other `error` values raise the corresponding `sdk_errors.*` class.

This centralises rate-limit detection in one place — individual callers no longer need
their own `except anthropic.RateLimitError` blocks.

### Step 3 — Port `finding_verifier.py`

- Delete `self.client`, `client` constructor param, and the whole `while iterations < MAX_ITERATIONS: self.client.messages.create(tools=VERIFICATION_TOOLS, ...)` loop.
- Replace with a single `run_native_verification(prompt, system_prompt, VERIFIER_MODEL, repo_path, json_schema=VERIFICATION_SCHEMA)` call.
- Remove `except anthropic.RateLimitError` at 343 and 862 — rate-limit handling is now centralised in `_run_query` (step 2). If the caller still wants to re-raise with state attached, catch `sdk_errors.RateLimitError` instead.
- Reference: PR #25's `73a01a0` diff already did this work. It's deleted from current code but recoverable from `git show 73a01a0 -- libs/openant-core/utilities/finding_verifier.py`.

### Step 4 — Port `agentic_enhancer/agent.py`

- Same shape as step 3: delete `self.client` and its constructor param, replace the `while iterations: self.client.messages.create(tools=TOOL_DEFINITIONS, ...)` loop with `_run_query_sync(prompt, options)` where `options = _build_options(model=AGENT_MODEL, allowed_tools=["Read","Grep","Glob","Bash"], add_dirs=[repo_path])`.
- Update the `raise` at line 203 to catch `sdk_errors.RateLimitError` and attach `agent_state` before re-raising (same pattern, different exception class).
- Reference: PR #25's `73a01a0` diff. Must account for upstream's added entry-point filtering (`entry_points`, `reachability` params) that didn't exist when #25 was written — preserve those.

### Step 5 — Port `context_enhancer.py`

- `classify_error` (lines 65-77): swap the `isinstance(exc, anthropic.*)` chain for checks against `sdk_errors.*` classes. Keep the returned dict shape so callers (diagnostic logging) don't change.
- `shared_client` at line 573: delete. With `_run_query_sync` there's no shared client to pre-construct — each call spins up a fresh `ClaudeSDKClient` context manager. If there's a real FD-exhaustion concern under high concurrency, it needs to be re-proven post-migration (the fear behind the comment came from the `anthropic` SDK's connection pool).

### Step 6 — Port `report/generator.py`

- Trivial. Replace `client = anthropic.Anthropic(); response = client.messages.create(...)` at lines 138 and 161 with `text = create_message(prompt, model=MODEL, system=system_prompt)`.
- Usage dict extraction (`_extract_usage(response)`) currently relies on `response.usage` shape. Need to return the SDK `ResultMessage`'s `usage` dict instead — `create_message` today returns only `text`; it needs a sibling that returns `(text, usage)` or the report generator needs to use `AnthropicClient` which already tracks usage.

### Step 7 — Drop `anthropic` from `pyproject.toml`

Once steps 3-6 land, grep confirms no `import anthropic` remains, and the smoke test
passes (clean venv install → `import openant, core, utilities, parsers, prompts, context, report`).
Then remove `"anthropic>=0.40.0"` from `dependencies`.

### Step 8 — End-to-end verification (do not skip)

The three manual smoke tests from PR #25's original test plan (which went unchecked):

- [ ] `openant enhance --fresh --workers 2` against a small test repo with a real API key.
- [ ] `openant verify` on a real finding with a real API key.
- [ ] `openant analyze` with a real API key.
- [ ] `openant report --format html` against a completed scan.
- [ ] `openant generate-context <repo>` against a real repo.
- [ ] All of the above with `OPENANT_LOCAL_CLAUDE=true` (local session auth path).
- [ ] **Force a rate limit**: run enhance at `--workers 20` briefly and confirm `GlobalRateLimiter.report_rate_limit` fires, all workers pause, execution resumes.

### Step 9 — Submit upstream

- Open an upstream PR with a conventional-commits subject. Do not bundle with the
  `GlobalRateLimiter` work — that's already upstream; we're just porting its integration.
- Include the regression-guard test from fork PR #30 (`tests/test_declared_dependencies.py`) — this is the mechanism that prevents a future upstream merge from silently re-introducing `anthropic` a second time.

## Open questions to resolve while implementing

1. **`VERIFICATION_TOOLS` / `TOOL_DEFINITIONS` were custom tool schemas** for `anthropic`'s tool-use API. SDK native tools (Read/Grep/Glob/Bash) cover the same surface but with different ergonomics. Re-read PR #25's prompt updates in `prompts/verification_prompts.py` and `utilities/agentic_enhancer/prompts.py` — they coach the model on the native tools. Those prompt changes must come along with the code changes or the model will fumble tool calls.
2. **`VERIFICATION_SCHEMA` JSON schema for structured output**: PR #25 added one in `prompts/verification_prompts.py`. Check that it still exists in current code, or re-add it.
3. **`restore_from` on `TokenTracker`**: PR #25 added this for checkpoint resume. Still there (`llm_client.py:259`). Good — no regression.
4. **Entry-point filtering + reachability** (added upstream in PR #23): these are new params on `ContextAgent.__init__` that PR #25 didn't know about. Preserve them when rewriting.

## Risks

- **Prompt drift**: the model's tool-calling behaviour changes between the custom tool schema and SDK native tools. Expected: PR #25's prompt-update patterns work. Unexpected: native tools produce different verdicts on some units. Mitigation: run a before/after comparison on a reference dataset (the fork has `geospatial_vuln12`, `flowise_vuln4`, `object_browser`).
- **Lost `retry-after`**: `GlobalRateLimiter` will use default backoff instead of API-provided. In practice the API's values are usually ≤30s so default backoff is already conservative. If this becomes a problem, the SDK would need to surface the header (file an issue upstream at `claude_agent_sdk`).
- **FD exhaustion concern in `ContextEnhancer`**: the `shared_client` comment implies this was observed under parallel load with `anthropic`. The SDK's per-call `ClaudeSDKClient` context manager is a different mechanism (subprocess spawn, not HTTP connection pool). Needs load-testing — not a reason to keep `anthropic`, but a reason to test at high `--workers`.
- **`AssistantMessage.error` timing**: untested assumption that the SDK delivers an `AssistantMessage` with `error="rate_limit"` *before* the `ResultMessage`. If it only appears inside `ResultMessage.result` as an error string, the detection point moves. Needs a live test against a real rate limit.

## Rough sizing (per step)

- Step 0a (live rate-limit spike): 30 min.
- Step 0b (recover PR #25 prompt deltas): 30 min.
- Step 1 (taxonomy): 0.5 day.
- Step 2 (SDK error surfacing in `_run_query`): 0.5 day.
- Step 3 (`finding_verifier.py`): 1 day — largest file, mostly re-apply PR #25's deleted diff with minor upstream-delta adjustments.
- Step 4 (`agent.py`): 1 day — same, plus preserving upstream's new entry-point/reachability params.
- Step 5 (`context_enhancer.py`): 0.5 day (5a: `classify_error` rewrite; 5b: delete `shared_client` — only after step 4).
- Step 6 (`report/generator.py`): 0.5 day.
- Step 7: trivial (one line in `pyproject.toml`).
- Step 8 (E2E with real API): 1 day, ~$30-50 API spend.
- Step 9 (upstream PR): 0.5 day.

Serial total: 5-6 days. Parallel critical path: ~3.5 days (see below).

## Dependency graph

```
Step 0a: Live rate-limit test  ──┐
Step 0b: Recover PR #25 prompts ─┤   (prep work, zero deps)
                                 │
Step 1: sdk_errors taxonomy ─────┼──→ Step 2: wire into _run_query ──┐
                                 │                                   │
Step 6: report/generator.py ─────┘   (only needs create_message,     ├──→ Step 7: drop dep ──→ Step 8: E2E ──→ Step 9: upstream PR
                                      already exists)                │
                                                                     │
                Step 3: finding_verifier.py ────────────────────────┤
                Step 4: agentic_enhancer/agent.py ──────────────────┤
                Step 5a: context_enhancer classify_error ───────────┤
                Step 5b: delete shared_client ←── depends on Step 4 ┘
```

**Critical path**: `Step 1 → Step 2 → (Step 3 ∥ Step 4) → Step 7 → Step 8 → Step 9`.

## Parallelisation waves

**Wave 1 — day 1, no blockers, run concurrently:**
- Step 0a — 30-min spike. Confirm `AssistantMessage.error == "rate_limit"` actually fires under load. Informs Step 2's implementation.
- Step 0b — recover PR #25's prompt updates via `git show 73a01a0 -- libs/openant-core/prompts/verification_prompts.py libs/openant-core/utilities/agentic_enhancer/prompts.py`. Stage on a prep branch. Needed by Steps 3 and 4.
- Step 1 — write `utilities/sdk_errors.py`. Pure new module, touches nothing existing.
- Step 6 — port `report/generator.py`. Uses `create_message` which already exists; doesn't need the new taxonomy (single-turn, no rate-limit loop). Can land **before** Step 1.

**Wave 2 — after Step 1 merges:**
- Step 2 — wire `AssistantMessage.error` detection into `_run_query`. Blocks Steps 3 and 4.
- Step 5a — rewrite `classify_error` against the new taxonomy. Independent of Step 2. Runs in parallel.

**Wave 3 — after Steps 1 + 2 merge:**
- Step 3 — port `finding_verifier.py`.
- Step 4 — port `agentic_enhancer/agent.py`.
- Touch disjoint files, no cross-dependency, merge in either order.

**Wave 4 — after Step 4 merges:**
- Step 5b — delete `shared_client` in `context_enhancer.py`. Requires `ContextAgent` already off `self.client`.

**Wave 5 — after Steps 3, 4, 5, 6 all in:**
- Step 7 — remove `anthropic` from `pyproject.toml`.
- Step 8 — E2E with real API. ~1 day, ~$50.

**Wave 6:**
- Step 9 — upstream PR.

## Team allocation

- **Solo**: 5-6 days fully sequential.
- **Two workers**: ~4 days. Worker A: critical path (Steps 0a/1/2/3/7/8/9). Worker B: parallel track (Steps 0b/6/4/5), merges asynchronously.
- **Three workers**: ~3.5 days. Worker C takes Step 5a + the reference-dataset before/after comparison (prompt-drift risk mitigation) + reviews.
- **Four+ workers**: diminishing returns — merge conflicts on `llm_client.py` and `pyproject.toml` eat the savings.

## Implementation via git worktrees

Per `CLAUDE.md`, feature branches live under `.worktrees/`. This maps cleanly onto the
wave plan: each PR-sized unit of work gets its own branch + worktree, so multiple tracks
can run concurrently without stashing or branch-switching. Worktrees share the underlying
`.git` metadata, so all worktrees see the same commit graph — merging a wave-1 branch
makes it immediately available as a base for wave-2 branches.

### Branch ↔ worktree mapping

| Wave | Step | Branch | Worktree path | Base |
|---|---|---|---|---|
| 1 | 0a | `chore/sdk-ratelimit-spike` | `.worktrees/chore/sdk-ratelimit-spike` | `master` |
| 1 | 0b | `chore/recover-pr25-prompts` | `.worktrees/chore/recover-pr25-prompts` | `master` |
| 1 | 1 | `feat/sdk-errors-taxonomy` | `.worktrees/feat/sdk-errors-taxonomy` | `master` |
| 1 | 6 | `refactor/report-generator-sdk` | `.worktrees/refactor/report-generator-sdk` | `master` |
| 2 | 2 | `feat/sdk-error-surfacing` | `.worktrees/feat/sdk-error-surfacing` | `master` (after Step 1 lands) |
| 2 | 5a | `refactor/classify-error-sdk` | `.worktrees/refactor/classify-error-sdk` | `master` (after Step 1 lands) |
| 3 | 3 | `refactor/verifier-sdk-native` | `.worktrees/refactor/verifier-sdk-native` | `master` (after Step 2 lands) |
| 3 | 4 | `refactor/enhancer-agent-sdk` | `.worktrees/refactor/enhancer-agent-sdk` | `master` (after Step 2 lands) |
| 4 | 5b | `refactor/drop-shared-client` | `.worktrees/refactor/drop-shared-client` | `master` (after Step 4 lands) |
| 5 | 7 | `chore/drop-anthropic-dep` | `.worktrees/chore/drop-anthropic-dep` | `master` (after Steps 3-6 land) |

### Bootstrap (wave 1)

From the repo root (no `cd`, no `git -C`):

```bash
git fetch origin
git worktree add -b chore/sdk-ratelimit-spike    .worktrees/chore/sdk-ratelimit-spike    master
git worktree add -b chore/recover-pr25-prompts   .worktrees/chore/recover-pr25-prompts   master
git worktree add -b feat/sdk-errors-taxonomy     .worktrees/feat/sdk-errors-taxonomy     master
git worktree add -b refactor/report-generator-sdk .worktrees/refactor/report-generator-sdk master
```

Each agent/developer then works from their worktree's root directory. When a branch
merges, run `git worktree remove .worktrees/<branch>` to clean up. `.worktrees/` is
expected to be gitignored (per `CLAUDE.md`); add it if missing.

### Spawning wave 2/3/4/5 worktrees

Each subsequent wave branches off `master` **after** its prerequisite has landed there.
Run `git fetch origin && git worktree add -b <new-branch> .worktrees/<new-branch> origin/master`
at the point the wave is ready to start.

Branching wave-2 work off an unmerged wave-1 branch is possible but risky: if review
feedback amends the wave-1 branch, the wave-2 branch needs rebasing. Prefer waiting for
the merge.

### Coordination rules

- **Only one branch touches `llm_client.py` at a time.** Steps 1 and 2 both modify it
  serially; Step 6 also imports from it but doesn't modify it. Any other branch that
  needs to touch it must rebase onto the current latest.
- **Only one branch touches `pyproject.toml` at a time.** Only Step 7 modifies it. Other
  branches must not add or change dependencies in their PRs; if a port genuinely needs
  a new dep, that's a separate PR first.
- **`tests/test_declared_dependencies.py`** (from fork PR #30) should already be on
  master before any port starts. It will fail CI on any branch that introduces an
  undeclared import.
- **Agent sessions**: when running concurrent Claude Code sessions, give each session
  a working directory inside its own worktree. Sessions sharing `.git` metadata is safe;
  sessions sharing a working tree is not.

## Sequencing rules

- One PR per step. **Do not bundle.** The fork's history shows "one big migration PR" is how we got into this mess twice (#25 was one big PR that left residual issues; #29's upstream merge was one big PR that undid the parts that did land).
- Keep `anthropic` declared in `pyproject.toml` throughout. Step 7 is the only PR that removes it, and only after grep confirms zero `import anthropic` and the smoke test passes.
- Review bandwidth is the real bottleneck. If PRs stack up waiting for review, the parallelism buys nothing. Confirm reviewer availability before opening Wave 1.


</details>



Current call	SDK replacement in `llm_client.py`
`client.messages.create(model=M, tools=VERIFICATION_TOOLS, ...)`	`run_native_verification(prompt, system, model, repo_path, json_schema)` — multi-turn with SDK native tools (Read/Grep/Glob/Bash). Replaces the manual tool loop entirely.
`client.messages.create(model=M, tools=TOOL_DEFINITIONS, ...)` (agent.py)	`_run_query_sync(prompt, options)` with `_build_options(model=M, allowed_tools=["Read","Grep","Glob","Bash"], add_dirs=[repo_path])`.
`client.messages.create(model=M, system=S, messages=[...])` (single-turn, no tools)	`create_message(prompt, model=M, system=S)` or `AnthropicClient(model=M).analyze_sync(prompt)` for tracked calls.

Wave	Step	Branch	Worktree path	Base
1	0a	`chore/sdk-ratelimit-spike`	`.worktrees/chore/sdk-ratelimit-spike`	`master`
1	0b	`chore/recover-pr25-prompts`	`.worktrees/chore/recover-pr25-prompts`	`master`
1	1	`feat/sdk-errors-taxonomy`	`.worktrees/feat/sdk-errors-taxonomy`	`master`
1	6	`refactor/report-generator-sdk`	`.worktrees/refactor/report-generator-sdk`	`master`
2	2	`feat/sdk-error-surfacing`	`.worktrees/feat/sdk-error-surfacing`	`master` (after Step 1 lands)
2	5a	`refactor/classify-error-sdk`	`.worktrees/refactor/classify-error-sdk`	`master` (after Step 1 lands)
3	3	`refactor/verifier-sdk-native`	`.worktrees/refactor/verifier-sdk-native`	`master` (after Step 2 lands)
3	4	`refactor/enhancer-agent-sdk`	`.worktrees/refactor/enhancer-agent-sdk`	`master` (after Step 2 lands)
4	5b	`refactor/drop-shared-client`	`.worktrees/refactor/drop-shared-client`	`master` (after Step 4 lands)
5	7	`chore/drop-anthropic-dep`	`.worktrees/chore/drop-anthropic-dep`	`master` (after Steps 3-6 land)

Step	Description	PR	Status
0a	Live rate-limit spike (confirm `AssistantMessage.error=="rate_limit"` under load)	—	Deferred, user-run (~30 min)
0b	Recover PR #25 prompt deltas	—	No-op — prompts already on master
1	`sdk_errors` taxonomy	#31	Merged
2	Wire SDK error surfacing into `_run_query`	#33	Merged
5a	Port `context_enhancer` `classify_error`	#34	Merged
6	Port `report/generator.py`	#32	Merged
3	Port `finding_verifier.py`	#37	Merged
4	Port `agentic_enhancer/agent.py` (absorbed 5b)	#36	Merged
5b	Delete `shared_client` in `context_enhancer`	#36	Absorbed into PR #36
7	Drop `anthropic` from `pyproject.toml` (+ final cleanup)	#38	Open
8	End-to-end verification (real API, ~$30-50)	—	Deferred, user-run
9	Open upstream PR to `knostic/OpenAnt`	—	After Step 8

	anthropic.RateLimitError	SDK `AssistantMessage.error == "rate_limit"`
Detection	Typed exception	Inspect message field
`retry-after` header	Yes, exact value	No — not surfaced
`request-id`	Yes	No
Other API errors	Separate exception types	Other `AssistantMessageError` values

Complete the Claude Agent SDK migration #35

Description

Goal

Status

Why this exists (short version)

Plan: Complete the Claude Agent SDK migration

Status update (why this plan exists now)

Feasibility answer (resolves the blocking decision from the previous draft)

Success criteria

Call-site inventory (current fork state, post-merge)

utilities/finding_verifier.py

utilities/agentic_enhancer/agent.py

utilities/context_enhancer.py

report/generator.py

Migration blueprint (per call site)

Step 1 — Error taxonomy

Step 2 — Wire SDK error surfacing

Step 3 — Port finding_verifier.py

Step 4 — Port agentic_enhancer/agent.py

Step 5 — Port context_enhancer.py

Step 6 — Port report/generator.py

Step 7 — Drop anthropic from pyproject.toml

Step 8 — End-to-end verification (do not skip)

Step 9 — Submit upstream

Open questions to resolve while implementing

Risks

Rough sizing (per step)

Dependency graph

Parallelisation waves

Team allocation

Implementation via git worktrees

Branch ↔ worktree mapping

Bootstrap (wave 1)

Spawning wave 2/3/4/5 worktrees

Coordination rules

Sequencing rules

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`utilities/finding_verifier.py`

`utilities/agentic_enhancer/agent.py`

`utilities/context_enhancer.py`

`report/generator.py`

Step 3 — Port `finding_verifier.py`

Step 4 — Port `agentic_enhancer/agent.py`

Step 5 — Port `context_enhancer.py`

Step 6 — Port `report/generator.py`

Step 7 — Drop `anthropic` from `pyproject.toml`