Goal
Match Claude Code's own /usage numbers exactly (tokens + cost) in any auth mode, and gain real-time, pre-request spend/usage enforcement — by optionally reading model traffic on the wire instead of (only) the transcript. This is the architectural fix for the accuracy ceiling in #156, and complements (does not replace) the hooks/transcript path.
Reference implementation
Datadog lapdog (open source: DataDog/dd-apm-test-agent, dir lapdog/ + ddapm_test_agent/claude_proxy.py, claude_cost_tracker.py; local clone at ~/work-dir/dd-apm-test-agent) does exactly this. It runs a localhost proxy, reads the usage receipt off each Anthropic response (incl. background calls the transcript omits), and prices it from the same table Claude Code uses — its $0.29 matched /usage $0.2908 to the cent. Note: lapdog also pulls the tool/agent span tree from Claude Code hooks, not the transcript (it never reads the Claude JSONL).
How it works
- Set
ANTHROPIC_BASE_URL to a local aflock listener (documented Claude Code feature; same mechanism LiteLLM / Bedrock gateways use). Chain to any existing value; forward upstream unchanged.
- Stream pass-through: proxy the SSE bytes live, tee the
message_start / message_delta usage events to update the running budget (no buffering → preserves streaming UX).
- Pre-request gate: before forwarding the next
/v1/messages, check budget; if over and enforcing, return a clean 429-style error so the agent stops gracefully (never kill mid-stream).
- No CA cert / no TLS MITM — localhost hop is plaintext to the proxy; the proxy does its own HTTPS upstream.
Why it's worth it
Risks / caveats (why it's opt-in, not default)
- In-band SPOF: aflock now sits in the critical path. fail-open = bypassable / no enforcement on failure; fail-closed = an aflock bug can stall the agent (a prod outage). Default fail-open (safe vs the agent, our real adversary); fail-closed opt-in.
- Credential + full prompt/response transit aflock → large trust-boundary expansion. Localhost-only by default; never log auth headers; never persist bodies (extract only
usage+model); no default forwarding. PII/compliance implications in prod.
- Off-localhost needs TLS; transparent (no-env-var) interception would require a CA cert = MITM = avoid.
- Coupling/fragility to Anthropic API shape (SSE events,
/v1/messages, usage) and, for the fetch-patch variant, to Claude Code's Bun runtime.
- Backend conflicts: must chain to existing
ANTHROPIC_BASE_URL; Bedrock/Vertex use a different protocol/auth (SigV4) → Anthropic usage parsing won't apply, so detect & disable there.
- $ enforcement stays notional under flat-rate subscription — the proxy fixes accuracy, not the meaning of a dollar cap; token/turn caps remain the honest control there.
Phased plan
Threat-model note
Enforces against the agent (won't bypass a localhost proxy it doesn't know about), fail-open by default. Not a control against a malicious human operator (who can kill the proxy / unset the env var) — don't market it as tamper-proof.
Related: #111, #96, #153, #100, #87. Supersedes the accuracy ceiling in #156; Phase 0 is #155.
Goal
Match Claude Code's own
/usagenumbers exactly (tokens + cost) in any auth mode, and gain real-time, pre-request spend/usage enforcement — by optionally reading model traffic on the wire instead of (only) the transcript. This is the architectural fix for the accuracy ceiling in #156, and complements (does not replace) the hooks/transcript path.Reference implementation
Datadog lapdog (open source:
DataDog/dd-apm-test-agent, dirlapdog/+ddapm_test_agent/claude_proxy.py,claude_cost_tracker.py; local clone at~/work-dir/dd-apm-test-agent) does exactly this. It runs a localhost proxy, reads theusagereceipt off each Anthropic response (incl. background calls the transcript omits), and prices it from the same table Claude Code uses — its$0.29matched/usage$0.2908to the cent. Note: lapdog also pulls the tool/agent span tree from Claude Code hooks, not the transcript (it never reads the Claude JSONL).How it works
ANTHROPIC_BASE_URLto a local aflock listener (documented Claude Code feature; same mechanism LiteLLM / Bedrock gateways use). Chain to any existing value; forward upstream unchanged.message_start/message_deltausageevents to update the running budget (no buffering → preserves streaming UX)./v1/messages, check budget; if over and enforcing, return a clean 429-style error so the agent stops gracefully (never kill mid-stream).Why it's worth it
Risks / caveats (why it's opt-in, not default)
usage+model); no default forwarding. PII/compliance implications in prod./v1/messages,usage) and, for the fetch-patch variant, to Claude Code's Bun runtime.ANTHROPIC_BASE_URL; Bedrock/Vertex use a different protocol/auth (SigV4) → Anthropicusageparsing won't apply, so detect & disable there.Phased plan
/usageto the cent incl. background calls. No enforcement.maxSpendUSD/ token caps from wire data in subscription mode.Threat-model note
Enforces against the agent (won't bypass a localhost proxy it doesn't know about), fail-open by default. Not a control against a malicious human operator (who can kill the proxy / unset the env var) — don't market it as tamper-proof.
Related: #111, #96, #153, #100, #87. Supersedes the accuracy ceiling in #156; Phase 0 is #155.