Is Claude Code's rate limiting enforced client-side or server-side?
A comprehensive security analysis with static code review, live API testing, and architectural deep-dive.
On March 31, 2026, the Claude Code source code became publicly accessible via its npm distribution. After reviewing it, I identified a potential architectural concern around how token counting is handled in the client-side policy enforcement path and reported it responsibly through HackerOne's Vulnerability Disclosure Program (Report #3642470).
The core question: Is the authoritative token count used for rate limit enforcement coming from the server's returned usage data, or from the client's local estimate? A small architectural detail — but potentially significant for Enterprise users on shared org-level limits.
This repository contains the full investigation, static source analysis, and live API test results that answer that question definitively.
No exploitable vulnerability exists. Rate limit enforcement is 100% server-side. The client never sends token counts that the server trusts, and the server actively rejects any attempt to inject them (HTTP 400).
| File | Description |
|---|---|
SECURITY_ANALYSIS_REPORT.md |
Full technical report — static code analysis, API architecture review, live test results |
security_tests.py |
Reproducible Python test suite (7 tests) — runs against the live Anthropic API |
test_results.json |
Structured JSON output from the test run on 2026-04-01 |
README.md |
This file |
The following diagrams illustrate how token counting and rate limiting actually work in the Claude Code + Anthropic API system. These are the key to understanding why the concern, while reasonable, does not constitute a vulnerability.
CLIENT (Claude Code) SERVER (Anthropic API)
==================== ======================
User sends prompt
|
v
+------------------+
| Build HTTP |
| Request Payload | POST /v1/messages
| { | {model, messages, max_tokens}
| model: "...", | ─────────────────────────────────────────────────> +-------------------------+
| messages: [...],| | 1. Receive raw request |
| max_tokens: N | NO token counts sent | 2. Authenticate (OAuth/ |
| } | NO usage fields sent | API key) |
+------------------+ NO rate limit claims | 3. Tokenize input |
| (server-side) |
| 4. Check rate limits |
| against token bucket |
| 5. Process request OR |
| return 429 |
| 6. Generate response |
+------------------+ | 7. Count output tokens |
| Read response | HTTP 200 + Headers + Body | 8. Return response + |
| headers & body | <─────────────────────────────────────────────────── | usage + rate headers |
+------------------+ +-------------------------+
|
v RESPONSE CONTAINS:
+------------------+ +-------------------------------------------------+
| Update LOCAL | | Body: |
| display counters | | usage.input_tokens: 142 (server-counted)|
| (totalCostUSD, | | usage.output_tokens: 87 (server-counted)|
| modelUsage) | | usage.cache_read_input_tokens: 0 |
| | | |
| These are for | | Headers: |
| UI display ONLY | | anthropic-ratelimit-requests-remaining: 3998 |
+------------------+ | anthropic-ratelimit-input-tokens-remaining: 1M |
| | anthropic-ratelimit-output-tokens-remaining: 4K|
v | x-should-retry: false |
Show cost/tokens +-------------------------------------------------+
in status bar ^ All values computed SERVER-SIDE
CLIENT SERVER
====== ======
+------------------+
| Send request | POST /v1/messages
| (no local gate | ──────────────────────────────────> +---------------------------+
| checks anything | | 1. Tokenize input |
| before sending) | | 2. Check token bucket: |
+------------------+ | remaining = 0 |
| RATE LIMIT EXCEEDED |
| 3. Return 429 |
+------------------+ HTTP 429 +---------------------------+
| Receive 429 | <──────────────────────────────────
| Read retry-after | Headers:
| Back off & retry | x-should-retry: true
+------------------+ retry-after: 30
| error.type: "rate_limit_error"
v
Wait, then retry SERVER decided. CLIENT obeyed.
ATTACKER SERVER
======== ======
+--------------------+
| Craft malicious | POST /v1/messages
| request with fake | {
| usage fields: | "model": "claude-sonnet-4-...",
| | "messages": [...],
| "usage": { | "max_tokens": 100,
| "input_tokens": 0 | ────────────────────────────────> +---------------------------+
| "output_tokens":0 | "usage": { | 1. Parse request body |
| }, | "input_tokens": 0, | 2. Strict schema |
| "token_count": 1 | "output_tokens": 0 | validation |
+--------------------+ }, | 3. "usage" is NOT a valid |
"token_count": 1 | request field |
} | 4. REJECT with 400 |
+---------------------------+
+--------------------+ HTTP 400
| Receive rejection | <────────────────────────────────
+--------------------+ {
"type": "error",
BLOCKED "error": {
"type": "invalid_request_error",
"message": "usage: Extra inputs
are not permitted"
}
}
Server does NOT read, trust, or
use any client-supplied token counts.
The "usage" field exists ONLY in
responses, never in requests.
+============================================================================+
| CLAUDE CODE CLIENT (Local) |
| |
| +-------------------+ +-------------------+ +--------------------+ |
| | User Interface | | Anthropic SDK | | Session State | |
| | | | | | | |
| | Shows: | | countTokens() | | totalCostUSD: 0 | |
| | - Token counts | | -> POST /v1/ | | modelUsage: {} | |
| | - Cost estimate | | messages/ | | tokenCounter: null| |
| | - Model usage | | count_tokens | | | |
| | | | (SERVER call) | | Display-only. | |
| | All values come | | | | Never used as | |
| | FROM the server | | create() | | gates to block | |
| | response, not | | -> POST /v1/ | | API requests. | |
| | local estimation | | messages | | | |
| +-------------------+ +-------------------+ +--------------------+ |
| ^ | ^ |
| | | HTTP requests | |
| | v | |
+============+=======================|========================+==============+
| | |
Read from response Raw request Read from response
| | |
=============|=======================|========================|==============
| | v | |
| +---------+-------------------+---+------------------------+-----------+ |
| | ANTHROPIC API SERVER | |
| | | |
| | +-------------------+ +-------------------+ +-------------------+ | |
| | | Tokenizer | | Rate Limiter | | Billing System | | |
| | | | | | | | | |
| | | Server-side | | Token bucket | | Tracks actual | | |
| | | tokenization of | | algorithm. | | usage from | | |
| | | every request. | | | | server-side | | |
| | | | | 3 dimensions: | | counts, not | | |
| | | Client CANNOT | | - RPM (requests) | | client claims. | | |
| | | influence this. | | - ITPM (input) | | | | |
| | | | | - OTPM (output) | | Immune to client | | |
| | | Returns: | | | | manipulation. | | |
| | | - input_tokens | | Returns 429 + | | | | |
| | | - output_tokens | | retry-after when | | | | |
| | | - cache_* tokens | | limit exceeded. | | | | |
| | +-------------------+ +-------------------+ +-------------------+ | |
| | | |
| +-----------------------------------------------------------------------+ |
| |
| SERVER-SIDE (Authoritative) |
+==============================================================================+
CAN the client... Answer
================================================ ========================
...send fake token counts to the server? NO - Server returns 400
"Extra inputs are not
permitted"
...skip rate limit checks? NO - There are no client-
side rate limit checks to
skip. Server enforces.
...manipulate local counters to NO - Local counters are
bypass server limits? display-only. Server
doesn't read them.
...use countTokens() to trick NO - countTokens() calls
the server into using a fake count? the SERVER. The server
computes the count.
...modify maxBudgetUsd to get NO - maxBudgetUsd is a
unlimited access? local spending cap (user
safety feature). Server
has its own limits.
...pre-compute tokens locally NO - There is no local
and skip the server call? tokenizer in the client.
The SDK calls /v1/messages
/count_tokens on the
server.
Executed on 2026-04-01 against https://api.anthropic.com using Claude Code v2.1.89 OAuth credentials.
| # | Test | HTTP | Result | What It Proves |
|---|---|---|---|---|
| 1 | Token Counting API | 200 |
PASS | Server returned input_tokens: 14 — computed server-side, not locally |
| 2 | Rate Limit Headers | 429 |
PASS* | Server returned 429 + x-should-retry: true — enforcement is server-side |
| 3 | Injected Token Counts | 400 |
PASS | Server rejected fake usage field: "Extra inputs are not permitted" |
| 4 | Estimate vs Actual | 200/429 |
PASS* | count_tokens returned 16 (server-computed); messages was rate-limited |
| 5 | 429 Capture | 429 |
PASS | rate_limit_error type — server is the enforcer |
| 6 | Large Request No Gate | 429 |
PASS | ~1000 token payload reached server — no client pre-flight blocked it |
| 7 | Invalid Model | 404 |
PASS | Server returned not_found_error — all validation is server-side |
* Tests 2 and 4 received 429 rate limit responses, which prevented capturing full usage data — but the 429 itself is proof of server-side enforcement.
REQUEST:
POST /v1/messages
Body: { ..., "usage": {"input_tokens": 0, "output_tokens": 0}, "token_count": 1 }
RESPONSE:
HTTP 400
{"type": "error", "error": {"type": "invalid_request_error",
"message": "usage: Extra inputs are not permitted"}}
The API does not accept a usage field in requests. It only exists in responses. There is literally no mechanism to send fake token counts to the server.
- Python 3.x
- An Anthropic API key OR Claude Code installed and authenticated
export ANTHROPIC_API_KEY="sk-ant-api03-..."
python security_tests.pyIf you're logged into Claude Code, the script automatically reads ~/.claude/.credentials.json:
python security_tests.py[PASS] TEST 1: Token Counting API is server-side
[PASS] TEST 2: Server returns rate limit headers and usage data
[PASS] TEST 3: Server ignores/rejects client-injected token counts
[PASS] TEST 4: Token count estimate vs actual usage
[PASS] TEST 5: Server enforces rate limits via 429 response
[PASS] TEST 6: Large request reaches server without client-side blocking
[PASS] TEST 7: Server validates request parameters (not client)
Note: Tests 2, 4, 5, 6 may return 429 if you're rate-limited. This is expected and actually proves the point — the server is enforcing limits, not the client.
All code references are from Claude Code v2.1.89 (anthropic.claude-code-2.1.89-win32-x64/extension.js).
countTokens(K, V) {
return this._client.post("/v1/messages/count_tokens", { body: K, ...V });
}case "message_delta":
H.usage.output_tokens = V.usage.output_tokens; // FROM SERVER
if (V.usage.input_tokens != null)
H.usage.input_tokens = V.usage.input_tokens; // FROM SERVER
if (V.usage.cache_creation_input_tokens != null)
H.usage.cache_creation_input_tokens = V.usage.cache_creation_input_tokens;
if (V.usage.cache_read_input_tokens != null)
H.usage.cache_read_input_tokens = V.usage.cache_read_input_tokens;if (K === 429) return new Cj(K, j, H, B); // Creates RateLimitError
// ...
async retryRequest(K, V, H, B) {
let j, G = B?.get("retry-after-ms"); // Read from SERVER header
let N = B?.get("retry-after"); // Read from SERVER header
}{
totalCostUSD: 0, // Display only — never gates requests
modelUsage: {}, // Display only — never gates requests
tokenCounter: null, // OpenTelemetry metrics — never gates requests
costCounter: null, // OpenTelemetry metrics — never gates requests
}There is a legitimate usability concern (not a vulnerability):
The Token Counting API (/v1/messages/count_tokens) returns worst-case estimates assuming 0% cache hits. But server-side ITPM enforcement excludes cached tokens for most models. This means:
- Cost estimates can be inflated by up to 90% when heavy caching is in effect
- A user seeing "200K tokens" might actually only consume 20K against their rate limit
This is documented in anthropics/claude-code#18726 (Closed: Not Planned).
Why this isn't a vulnerability: The mismatch is conservative — it overestimates, not underestimates. An attacker wanting to bypass rate limits would need the opposite: a way to make the server undercount tokens. The Token Counting API can't do that because the server counts independently.
This concern was reported through HackerOne's Vulnerability Disclosure Program before any testing was performed.
| Detail | Value |
|---|---|
| Platform | HackerOne |
| Report ID | #3642470 |
| Researcher | Idrissa Maiga |
| Status | Submitted and under review by Anthropic's security team |
| Severity Assessed | No exploitable vulnerability (see analysis above) |
Disclosure timeline:
- 2026-03-31 — Claude Code source became publicly accessible via npm distribution
- 2026-03-31 — Static code analysis performed (no systems tested, no exploitation attempted)
- 2026-04-01 — Report #3642470 submitted to Anthropic via HackerOne VDP
- 2026-04-01 — Live API tests conducted using researcher's own authenticated account
- 2026-04-01 — Full analysis and test results published to this repository
The analysis was conducted entirely through:
- Static code analysis of publicly distributed Claude Code bundles
- Reviewing official Anthropic API documentation
- Standard API calls using the researcher's own authenticated credentials
- No systems were tested without authorization
- No exploitation was attempted
| Source | Link |
|---|---|
| HackerOne Report #3642470 | hackerone.com/reports/3642470 |
| Anthropic VDP (HackerOne) | hackerone.com/anthropic |
| Anthropic Rate Limits | platform.claude.com/docs/en/api/rate-limits |
| Token Counting API | platform.claude.com/docs/en/build-with-claude/token-counting |
| Rate Limits Support Article | support.claude.com/en/articles/8243635 |
| GitHub Issue #18726 | anthropics/claude-code#18726 |
| Token Bucket Algorithm | Wikipedia |
Idrissa Maiga
Software Developer | Security Researcher
GitHub | HackerOne | LinkedIn
This research is published for educational and security research purposes. All testing was performed on the researcher's own account with proper authorization.
Analysis conducted with assistance from Claude Opus 4.6.