Claude Code Security Analysis: Token Counting & Rate Limit Enforcement

Is Claude Code's rate limiting enforced client-side or server-side?
A comprehensive security analysis with static code review, live API testing, and architectural deep-dive.

Background

On March 31, 2026, the Claude Code source code became publicly accessible via its npm distribution. After reviewing it, I identified a potential architectural concern around how token counting is handled in the client-side policy enforcement path and reported it responsibly through HackerOne's Vulnerability Disclosure Program (Report #3642470).

The core question: Is the authoritative token count used for rate limit enforcement coming from the server's returned usage data, or from the client's local estimate? A small architectural detail — but potentially significant for Enterprise users on shared org-level limits.

This repository contains the full investigation, static source analysis, and live API test results that answer that question definitively.

TL;DR — Verdict

No exploitable vulnerability exists. Rate limit enforcement is 100% server-side. The client never sends token counts that the server trusts, and the server actively rejects any attempt to inject them (HTTP 400).

Repository Contents

File	Description
`SECURITY_ANALYSIS_REPORT.md`	Full technical report — static code analysis, API architecture review, live test results
`security_tests.py`	Reproducible Python test suite (7 tests) — runs against the live Anthropic API
`test_results.json`	Structured JSON output from the test run on 2026-04-01
`README.md`	This file

Logical Architecture Diagram

The following diagrams illustrate how token counting and rate limiting actually work in the Claude Code + Anthropic API system. These are the key to understanding why the concern, while reasonable, does not constitute a vulnerability.

1. Normal Request Flow (What Actually Happens)

                    CLIENT (Claude Code)                                     SERVER (Anthropic API)
                    ====================                                     ======================

 User sends prompt
        |
        v
 +------------------+
 | Build HTTP        |
 | Request Payload   |      POST /v1/messages
 | {                 |      {model, messages, max_tokens}
 |   model: "...",   | ─────────────────────────────────────────────────>  +-------------------------+
 |   messages: [...],|                                                     | 1. Receive raw request  |
 |   max_tokens: N   |         NO token counts sent                        | 2. Authenticate (OAuth/ |
 | }                 |         NO usage fields sent                        |    API key)             |
 +------------------+          NO rate limit claims                        | 3. Tokenize input       |
                                                                           |    (server-side)        |
                                                                           | 4. Check rate limits    |
                                                                           |    against token bucket |
                                                                           | 5. Process request OR   |
                                                                           |    return 429           |
                                                                           | 6. Generate response    |
 +------------------+                                                      | 7. Count output tokens  |
 | Read response     |      HTTP 200 + Headers + Body                      | 8. Return response +    |
 | headers & body    | <─────────────────────────────────────────────────── |    usage + rate headers |
 +------------------+                                                      +-------------------------+
        |
        v                         RESPONSE CONTAINS:
 +------------------+             +-------------------------------------------------+
 | Update LOCAL      |             | Body:                                           |
 | display counters  |             |   usage.input_tokens: 142       (server-counted)|
 | (totalCostUSD,    |             |   usage.output_tokens: 87       (server-counted)|
 |  modelUsage)      |             |   usage.cache_read_input_tokens: 0              |
 |                   |             |                                                 |
 | These are for     |             | Headers:                                        |
 | UI display ONLY   |             |   anthropic-ratelimit-requests-remaining: 3998  |
 +------------------+             |   anthropic-ratelimit-input-tokens-remaining: 1M |
        |                         |   anthropic-ratelimit-output-tokens-remaining: 4K|
        v                         |   x-should-retry: false                          |
 Show cost/tokens                 +-------------------------------------------------+
 in status bar                     ^ All values computed SERVER-SIDE

2. Rate Limit Enforcement Flow (429 Rejection)

 CLIENT                                                    SERVER
 ======                                                    ======

 +------------------+
 | Send request      |       POST /v1/messages
 | (no local gate    | ──────────────────────────────────>  +---------------------------+
 |  checks anything  |                                      | 1. Tokenize input         |
 |  before sending)  |                                      | 2. Check token bucket:    |
 +------------------+                                      |    remaining = 0          |
                                                            |    RATE LIMIT EXCEEDED    |
                                                            | 3. Return 429             |
 +------------------+       HTTP 429                        +---------------------------+
 | Receive 429       | <────────────────────────────────── 
 | Read retry-after  |       Headers:
 | Back off & retry  |         x-should-retry: true
 +------------------+         retry-after: 30
        |                      error.type: "rate_limit_error"
        v
 Wait, then retry            SERVER decided. CLIENT obeyed.

3. Token Count Injection Attempt (What Happens If You Try)

 ATTACKER                                                  SERVER
 ========                                                  ======

 +--------------------+
 | Craft malicious     |     POST /v1/messages
 | request with fake   |     {
 | usage fields:       |       "model": "claude-sonnet-4-...",
 |                     |       "messages": [...],
 | "usage": {          |       "max_tokens": 100,
 |   "input_tokens": 0 | ────────────────────────────────>  +---------------------------+
 |   "output_tokens":0 |       "usage": {                   | 1. Parse request body     |
 | },                  |         "input_tokens": 0,          | 2. Strict schema          |
 | "token_count": 1    |         "output_tokens": 0          |    validation             |
 +--------------------+       },                            | 3. "usage" is NOT a valid |
                               "token_count": 1              |    request field          |
                             }                               | 4. REJECT with 400        |
                                                             +---------------------------+
 +--------------------+      HTTP 400
 | Receive rejection   | <──────────────────────────────── 
 +--------------------+      {
                               "type": "error",
        BLOCKED               "error": {
                                 "type": "invalid_request_error",
                                 "message": "usage: Extra inputs
                                  are not permitted"
                               }
                             }

                             Server does NOT read, trust, or
                             use any client-supplied token counts.
                             The "usage" field exists ONLY in
                             responses, never in requests.

4. Where Token Counting Happens (System Overview)

+============================================================================+
|                        CLAUDE CODE CLIENT (Local)                          |
|                                                                            |
|  +-------------------+    +-------------------+    +--------------------+  |
|  | User Interface    |    | Anthropic SDK     |    | Session State      |  |
|  |                   |    |                   |    |                    |  |
|  | Shows:            |    | countTokens()     |    | totalCostUSD: 0   |  |
|  | - Token counts    |    |   -> POST /v1/    |    | modelUsage: {}    |  |
|  | - Cost estimate   |    |   messages/       |    | tokenCounter: null|  |
|  | - Model usage     |    |   count_tokens    |    |                   |  |
|  |                   |    |   (SERVER call)   |    | Display-only.     |  |
|  | All values come   |    |                   |    | Never used as     |  |
|  | FROM the server   |    | create()          |    | gates to block    |  |
|  | response, not     |    |   -> POST /v1/    |    | API requests.     |  |
|  | local estimation  |    |   messages        |    |                   |  |
|  +-------------------+    +-------------------+    +--------------------+  |
|            ^                       |                        ^              |
|            |                       | HTTP requests          |              |
|            |                       v                        |              |
+============+=======================|========================+==============+
             |                       |                        |
    Read from response          Raw request             Read from response
             |                       |                        |
=============|=======================|========================|==============
|            |                       v                        |              |
|  +---------+-------------------+---+------------------------+-----------+  |
|  |                    ANTHROPIC API SERVER                               |  |
|  |                                                                       |  |
|  |  +-------------------+  +-------------------+  +-------------------+  |  |
|  |  | Tokenizer         |  | Rate Limiter      |  | Billing System   |  |  |
|  |  |                   |  |                   |  |                   |  |  |
|  |  | Server-side       |  | Token bucket      |  | Tracks actual    |  |  |
|  |  | tokenization of   |  | algorithm.        |  | usage from       |  |  |
|  |  | every request.    |  |                   |  | server-side      |  |  |
|  |  |                   |  | 3 dimensions:     |  | counts, not      |  |  |
|  |  | Client CANNOT     |  | - RPM (requests)  |  | client claims.   |  |  |
|  |  | influence this.   |  | - ITPM (input)    |  |                   |  |  |
|  |  |                   |  | - OTPM (output)   |  | Immune to client |  |  |
|  |  | Returns:          |  |                   |  | manipulation.    |  |  |
|  |  | - input_tokens    |  | Returns 429 +     |  |                   |  |  |
|  |  | - output_tokens   |  | retry-after when  |  |                   |  |  |
|  |  | - cache_* tokens  |  | limit exceeded.   |  |                   |  |  |
|  |  +-------------------+  +-------------------+  +-------------------+  |  |
|  |                                                                       |  |
|  +-----------------------------------------------------------------------+  |
|                                                                              |
|                        SERVER-SIDE (Authoritative)                           |
+==============================================================================+

5. What the Client CANNOT Do

  CAN the client...                                    Answer
  ================================================     ========================

  ...send fake token counts to the server?             NO - Server returns 400
                                                       "Extra inputs are not
                                                       permitted"

  ...skip rate limit checks?                           NO - There are no client-
                                                       side rate limit checks to
                                                       skip. Server enforces.

  ...manipulate local counters to                      NO - Local counters are
    bypass server limits?                              display-only. Server
                                                       doesn't read them.

  ...use countTokens() to trick                        NO - countTokens() calls
    the server into using a fake count?                the SERVER. The server
                                                       computes the count.

  ...modify maxBudgetUsd to get                        NO - maxBudgetUsd is a
    unlimited access?                                  local spending cap (user
                                                       safety feature). Server
                                                       has its own limits.

  ...pre-compute tokens locally                        NO - There is no local
    and skip the server call?                          tokenizer in the client.
                                                       The SDK calls /v1/messages
                                                       /count_tokens on the
                                                       server.

Live Test Results

Executed on 2026-04-01 against https://api.anthropic.com using Claude Code v2.1.89 OAuth credentials.

#	Test	HTTP	Result	What It Proves
1	Token Counting API	`200`	PASS	Server returned `input_tokens: 14` — computed server-side, not locally
2	Rate Limit Headers	`429`	PASS*	Server returned `429` + `x-should-retry: true` — enforcement is server-side
3	Injected Token Counts	`400`	PASS	Server rejected fake `usage` field: `"Extra inputs are not permitted"`
4	Estimate vs Actual	`200`/`429`	PASS*	count_tokens returned 16 (server-computed); messages was rate-limited
5	429 Capture	`429`	PASS	`rate_limit_error` type — server is the enforcer
6	Large Request No Gate	`429`	PASS	~1000 token payload reached server — no client pre-flight blocked it
7	Invalid Model	`404`	PASS	Server returned `not_found_error` — all validation is server-side

* Tests 2 and 4 received 429 rate limit responses, which prevented capturing full usage data — but the 429 itself is proof of server-side enforcement.

The Most Critical Finding (Test 3)

REQUEST:
  POST /v1/messages
  Body: { ..., "usage": {"input_tokens": 0, "output_tokens": 0}, "token_count": 1 }

RESPONSE:
  HTTP 400
  {"type": "error", "error": {"type": "invalid_request_error",
   "message": "usage: Extra inputs are not permitted"}}

The API does not accept a usage field in requests. It only exists in responses. There is literally no mechanism to send fake token counts to the server.

How to Run the Tests Yourself

Prerequisites

Python 3.x
An Anthropic API key OR Claude Code installed and authenticated

Option A: With an API Key

export ANTHROPIC_API_KEY="sk-ant-api03-..."
python security_tests.py

Option B: With Claude Code OAuth Credentials

If you're logged into Claude Code, the script automatically reads ~/.claude/.credentials.json:

python security_tests.py

Expected Output

[PASS] TEST 1: Token Counting API is server-side
[PASS] TEST 2: Server returns rate limit headers and usage data
[PASS] TEST 3: Server ignores/rejects client-injected token counts
[PASS] TEST 4: Token count estimate vs actual usage
[PASS] TEST 5: Server enforces rate limits via 429 response
[PASS] TEST 6: Large request reaches server without client-side blocking
[PASS] TEST 7: Server validates request parameters (not client)

Note: Tests 2, 4, 5, 6 may return 429 if you're rate-limited. This is expected and actually proves the point — the server is enforcing limits, not the client.

Source Code Evidence

All code references are from Claude Code v2.1.89 (anthropic.claude-code-2.1.89-win32-x64/extension.js).

1. `countTokens()` — Server API Call, Not Local

countTokens(K, V) {
  return this._client.post("/v1/messages/count_tokens", { body: K, ...V });
}

2. Usage Data — Populated From Server Response

case "message_delta":
  H.usage.output_tokens = V.usage.output_tokens;           // FROM SERVER
  if (V.usage.input_tokens != null)
    H.usage.input_tokens = V.usage.input_tokens;            // FROM SERVER
  if (V.usage.cache_creation_input_tokens != null)
    H.usage.cache_creation_input_tokens = V.usage.cache_creation_input_tokens;
  if (V.usage.cache_read_input_tokens != null)
    H.usage.cache_read_input_tokens = V.usage.cache_read_input_tokens;

3. 429 Handling — Reactive, Not Pre-Emptive

if (K === 429) return new Cj(K, j, H, B);  // Creates RateLimitError
// ...
async retryRequest(K, V, H, B) {
  let j, G = B?.get("retry-after-ms");    // Read from SERVER header
  let N = B?.get("retry-after");            // Read from SERVER header
}

4. Session State — Display Counters Only

{
  totalCostUSD: 0,           // Display only — never gates requests
  modelUsage: {},            // Display only — never gates requests  
  tokenCounter: null,        // OpenTelemetry metrics — never gates requests
  costCounter: null,         // OpenTelemetry metrics — never gates requests
}

The Real (Non-Security) Issue

There is a legitimate usability concern (not a vulnerability):

The Token Counting API (/v1/messages/count_tokens) returns worst-case estimates assuming 0% cache hits. But server-side ITPM enforcement excludes cached tokens for most models. This means:

Cost estimates can be inflated by up to 90% when heavy caching is in effect
A user seeing "200K tokens" might actually only consume 20K against their rate limit

This is documented in anthropics/claude-code#18726 (Closed: Not Planned).

Why this isn't a vulnerability: The mismatch is conservative — it overestimates, not underestimates. An attacker wanting to bypass rate limits would need the opposite: a way to make the server undercount tokens. The Token Counting API can't do that because the server counts independently.

Responsible Disclosure

This concern was reported through HackerOne's Vulnerability Disclosure Program before any testing was performed.

Detail	Value
Platform	HackerOne
Report ID	#3642470
Researcher	Idrissa Maiga
Status	Submitted and under review by Anthropic's security team
Severity Assessed	No exploitable vulnerability (see analysis above)

Disclosure timeline:

2026-03-31 — Claude Code source became publicly accessible via npm distribution
2026-03-31 — Static code analysis performed (no systems tested, no exploitation attempted)
2026-04-01 — Report #3642470 submitted to Anthropic via HackerOne VDP
2026-04-01 — Live API tests conducted using researcher's own authenticated account
2026-04-01 — Full analysis and test results published to this repository

The analysis was conducted entirely through:

Static code analysis of publicly distributed Claude Code bundles
Reviewing official Anthropic API documentation
Standard API calls using the researcher's own authenticated credentials
No systems were tested without authorization
No exploitation was attempted

References

Source	Link
HackerOne Report #3642470	hackerone.com/reports/3642470
Anthropic VDP (HackerOne)	hackerone.com/anthropic
Anthropic Rate Limits	platform.claude.com/docs/en/api/rate-limits
Token Counting API	platform.claude.com/docs/en/build-with-claude/token-counting
Rate Limits Support Article	support.claude.com/en/articles/8243635
GitHub Issue #18726	anthropics/claude-code#18726
Token Bucket Algorithm	Wikipedia

Author

Idrissa Maiga
Software Developer | Security Researcher
GitHub | HackerOne | LinkedIn

License

This research is published for educational and security research purposes. All testing was performed on the researcher's own account with proper authorization.

Analysis conducted with assistance from Claude Opus 4.6.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY_ANALYSIS_REPORT.md		SECURITY_ANALYSIS_REPORT.md
security_tests.py		security_tests.py
test_results.json		test_results.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Claude Code Security Analysis: Token Counting & Rate Limit Enforcement

Background

TL;DR — Verdict

Repository Contents

Logical Architecture Diagram

1. Normal Request Flow (What Actually Happens)

2. Rate Limit Enforcement Flow (429 Rejection)

3. Token Count Injection Attempt (What Happens If You Try)

4. Where Token Counting Happens (System Overview)

5. What the Client CANNOT Do

Live Test Results

The Most Critical Finding (Test 3)

How to Run the Tests Yourself

Prerequisites

Option A: With an API Key

Option B: With Claude Code OAuth Credentials

Expected Output

Source Code Evidence

1. `countTokens()` — Server API Call, Not Local

2. Usage Data — Populated From Server Response

3. 429 Handling — Reactive, Not Pre-Emptive

4. Session State — Display Counters Only

The Real (Non-Security) Issue

Responsible Disclosure

References

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Claude Code Security Analysis: Token Counting & Rate Limit Enforcement

Background

TL;DR — Verdict

Repository Contents

Logical Architecture Diagram

1. Normal Request Flow (What Actually Happens)

2. Rate Limit Enforcement Flow (429 Rejection)

3. Token Count Injection Attempt (What Happens If You Try)

4. Where Token Counting Happens (System Overview)

5. What the Client CANNOT Do

Live Test Results

The Most Critical Finding (Test 3)

How to Run the Tests Yourself

Prerequisites

Option A: With an API Key

Option B: With Claude Code OAuth Credentials

Expected Output

Source Code Evidence

1. countTokens() — Server API Call, Not Local

2. Usage Data — Populated From Server Response

3. 429 Handling — Reactive, Not Pre-Emptive

4. Session State — Display Counters Only

The Real (Non-Security) Issue

Responsible Disclosure

References

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. `countTokens()` — Server API Call, Not Local

Packages