Skip to content

IdrissaMaiga/Claude-Code-Security-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Claude Code Security Analysis: Token Counting & Rate Limit Enforcement

Python Security Tests

Is Claude Code's rate limiting enforced client-side or server-side?
A comprehensive security analysis with static code review, live API testing, and architectural deep-dive.


Background

On March 31, 2026, the Claude Code source code became publicly accessible via its npm distribution. After reviewing it, I identified a potential architectural concern around how token counting is handled in the client-side policy enforcement path and reported it responsibly through HackerOne's Vulnerability Disclosure Program (Report #3642470).

The core question: Is the authoritative token count used for rate limit enforcement coming from the server's returned usage data, or from the client's local estimate? A small architectural detail — but potentially significant for Enterprise users on shared org-level limits.

This repository contains the full investigation, static source analysis, and live API test results that answer that question definitively.


TL;DR — Verdict

No exploitable vulnerability exists. Rate limit enforcement is 100% server-side. The client never sends token counts that the server trusts, and the server actively rejects any attempt to inject them (HTTP 400).


Repository Contents

File Description
SECURITY_ANALYSIS_REPORT.md Full technical report — static code analysis, API architecture review, live test results
security_tests.py Reproducible Python test suite (7 tests) — runs against the live Anthropic API
test_results.json Structured JSON output from the test run on 2026-04-01
README.md This file

Logical Architecture Diagram

The following diagrams illustrate how token counting and rate limiting actually work in the Claude Code + Anthropic API system. These are the key to understanding why the concern, while reasonable, does not constitute a vulnerability.

1. Normal Request Flow (What Actually Happens)

                    CLIENT (Claude Code)                                     SERVER (Anthropic API)
                    ====================                                     ======================

 User sends prompt
        |
        v
 +------------------+
 | Build HTTP        |
 | Request Payload   |      POST /v1/messages
 | {                 |      {model, messages, max_tokens}
 |   model: "...",   | ─────────────────────────────────────────────────>  +-------------------------+
 |   messages: [...],|                                                     | 1. Receive raw request  |
 |   max_tokens: N   |         NO token counts sent                        | 2. Authenticate (OAuth/ |
 | }                 |         NO usage fields sent                        |    API key)             |
 +------------------+          NO rate limit claims                        | 3. Tokenize input       |
                                                                           |    (server-side)        |
                                                                           | 4. Check rate limits    |
                                                                           |    against token bucket |
                                                                           | 5. Process request OR   |
                                                                           |    return 429           |
                                                                           | 6. Generate response    |
 +------------------+                                                      | 7. Count output tokens  |
 | Read response     |      HTTP 200 + Headers + Body                      | 8. Return response +    |
 | headers & body    | <─────────────────────────────────────────────────── |    usage + rate headers |
 +------------------+                                                      +-------------------------+
        |
        v                         RESPONSE CONTAINS:
 +------------------+             +-------------------------------------------------+
 | Update LOCAL      |             | Body:                                           |
 | display counters  |             |   usage.input_tokens: 142       (server-counted)|
 | (totalCostUSD,    |             |   usage.output_tokens: 87       (server-counted)|
 |  modelUsage)      |             |   usage.cache_read_input_tokens: 0              |
 |                   |             |                                                 |
 | These are for     |             | Headers:                                        |
 | UI display ONLY   |             |   anthropic-ratelimit-requests-remaining: 3998  |
 +------------------+             |   anthropic-ratelimit-input-tokens-remaining: 1M |
        |                         |   anthropic-ratelimit-output-tokens-remaining: 4K|
        v                         |   x-should-retry: false                          |
 Show cost/tokens                 +-------------------------------------------------+
 in status bar                     ^ All values computed SERVER-SIDE

2. Rate Limit Enforcement Flow (429 Rejection)

 CLIENT                                                    SERVER
 ======                                                    ======

 +------------------+
 | Send request      |       POST /v1/messages
 | (no local gate    | ──────────────────────────────────>  +---------------------------+
 |  checks anything  |                                      | 1. Tokenize input         |
 |  before sending)  |                                      | 2. Check token bucket:    |
 +------------------+                                      |    remaining = 0          |
                                                            |    RATE LIMIT EXCEEDED    |
                                                            | 3. Return 429             |
 +------------------+       HTTP 429                        +---------------------------+
 | Receive 429       | <────────────────────────────────── 
 | Read retry-after  |       Headers:
 | Back off & retry  |         x-should-retry: true
 +------------------+         retry-after: 30
        |                      error.type: "rate_limit_error"
        v
 Wait, then retry            SERVER decided. CLIENT obeyed.

3. Token Count Injection Attempt (What Happens If You Try)

 ATTACKER                                                  SERVER
 ========                                                  ======

 +--------------------+
 | Craft malicious     |     POST /v1/messages
 | request with fake   |     {
 | usage fields:       |       "model": "claude-sonnet-4-...",
 |                     |       "messages": [...],
 | "usage": {          |       "max_tokens": 100,
 |   "input_tokens": 0 | ────────────────────────────────>  +---------------------------+
 |   "output_tokens":0 |       "usage": {                   | 1. Parse request body     |
 | },                  |         "input_tokens": 0,          | 2. Strict schema          |
 | "token_count": 1    |         "output_tokens": 0          |    validation             |
 +--------------------+       },                            | 3. "usage" is NOT a valid |
                               "token_count": 1              |    request field          |
                             }                               | 4. REJECT with 400        |
                                                             +---------------------------+
 +--------------------+      HTTP 400
 | Receive rejection   | <──────────────────────────────── 
 +--------------------+      {
                               "type": "error",
        BLOCKED               "error": {
                                 "type": "invalid_request_error",
                                 "message": "usage: Extra inputs
                                  are not permitted"
                               }
                             }

                             Server does NOT read, trust, or
                             use any client-supplied token counts.
                             The "usage" field exists ONLY in
                             responses, never in requests.

4. Where Token Counting Happens (System Overview)

+============================================================================+
|                        CLAUDE CODE CLIENT (Local)                          |
|                                                                            |
|  +-------------------+    +-------------------+    +--------------------+  |
|  | User Interface    |    | Anthropic SDK     |    | Session State      |  |
|  |                   |    |                   |    |                    |  |
|  | Shows:            |    | countTokens()     |    | totalCostUSD: 0   |  |
|  | - Token counts    |    |   -> POST /v1/    |    | modelUsage: {}    |  |
|  | - Cost estimate   |    |   messages/       |    | tokenCounter: null|  |
|  | - Model usage     |    |   count_tokens    |    |                   |  |
|  |                   |    |   (SERVER call)   |    | Display-only.     |  |
|  | All values come   |    |                   |    | Never used as     |  |
|  | FROM the server   |    | create()          |    | gates to block    |  |
|  | response, not     |    |   -> POST /v1/    |    | API requests.     |  |
|  | local estimation  |    |   messages        |    |                   |  |
|  +-------------------+    +-------------------+    +--------------------+  |
|            ^                       |                        ^              |
|            |                       | HTTP requests          |              |
|            |                       v                        |              |
+============+=======================|========================+==============+
             |                       |                        |
    Read from response          Raw request             Read from response
             |                       |                        |
=============|=======================|========================|==============
|            |                       v                        |              |
|  +---------+-------------------+---+------------------------+-----------+  |
|  |                    ANTHROPIC API SERVER                               |  |
|  |                                                                       |  |
|  |  +-------------------+  +-------------------+  +-------------------+  |  |
|  |  | Tokenizer         |  | Rate Limiter      |  | Billing System   |  |  |
|  |  |                   |  |                   |  |                   |  |  |
|  |  | Server-side       |  | Token bucket      |  | Tracks actual    |  |  |
|  |  | tokenization of   |  | algorithm.        |  | usage from       |  |  |
|  |  | every request.    |  |                   |  | server-side      |  |  |
|  |  |                   |  | 3 dimensions:     |  | counts, not      |  |  |
|  |  | Client CANNOT     |  | - RPM (requests)  |  | client claims.   |  |  |
|  |  | influence this.   |  | - ITPM (input)    |  |                   |  |  |
|  |  |                   |  | - OTPM (output)   |  | Immune to client |  |  |
|  |  | Returns:          |  |                   |  | manipulation.    |  |  |
|  |  | - input_tokens    |  | Returns 429 +     |  |                   |  |  |
|  |  | - output_tokens   |  | retry-after when  |  |                   |  |  |
|  |  | - cache_* tokens  |  | limit exceeded.   |  |                   |  |  |
|  |  +-------------------+  +-------------------+  +-------------------+  |  |
|  |                                                                       |  |
|  +-----------------------------------------------------------------------+  |
|                                                                              |
|                        SERVER-SIDE (Authoritative)                           |
+==============================================================================+

5. What the Client CANNOT Do

  CAN the client...                                    Answer
  ================================================     ========================

  ...send fake token counts to the server?             NO - Server returns 400
                                                       "Extra inputs are not
                                                       permitted"

  ...skip rate limit checks?                           NO - There are no client-
                                                       side rate limit checks to
                                                       skip. Server enforces.

  ...manipulate local counters to                      NO - Local counters are
    bypass server limits?                              display-only. Server
                                                       doesn't read them.

  ...use countTokens() to trick                        NO - countTokens() calls
    the server into using a fake count?                the SERVER. The server
                                                       computes the count.

  ...modify maxBudgetUsd to get                        NO - maxBudgetUsd is a
    unlimited access?                                  local spending cap (user
                                                       safety feature). Server
                                                       has its own limits.

  ...pre-compute tokens locally                        NO - There is no local
    and skip the server call?                          tokenizer in the client.
                                                       The SDK calls /v1/messages
                                                       /count_tokens on the
                                                       server.

Live Test Results

Executed on 2026-04-01 against https://api.anthropic.com using Claude Code v2.1.89 OAuth credentials.

# Test HTTP Result What It Proves
1 Token Counting API 200 PASS Server returned input_tokens: 14 — computed server-side, not locally
2 Rate Limit Headers 429 PASS* Server returned 429 + x-should-retry: true — enforcement is server-side
3 Injected Token Counts 400 PASS Server rejected fake usage field: "Extra inputs are not permitted"
4 Estimate vs Actual 200/429 PASS* count_tokens returned 16 (server-computed); messages was rate-limited
5 429 Capture 429 PASS rate_limit_error type — server is the enforcer
6 Large Request No Gate 429 PASS ~1000 token payload reached server — no client pre-flight blocked it
7 Invalid Model 404 PASS Server returned not_found_error — all validation is server-side

* Tests 2 and 4 received 429 rate limit responses, which prevented capturing full usage data — but the 429 itself is proof of server-side enforcement.

The Most Critical Finding (Test 3)

REQUEST:
  POST /v1/messages
  Body: { ..., "usage": {"input_tokens": 0, "output_tokens": 0}, "token_count": 1 }

RESPONSE:
  HTTP 400
  {"type": "error", "error": {"type": "invalid_request_error",
   "message": "usage: Extra inputs are not permitted"}}

The API does not accept a usage field in requests. It only exists in responses. There is literally no mechanism to send fake token counts to the server.


How to Run the Tests Yourself

Prerequisites

  • Python 3.x
  • An Anthropic API key OR Claude Code installed and authenticated

Option A: With an API Key

export ANTHROPIC_API_KEY="sk-ant-api03-..."
python security_tests.py

Option B: With Claude Code OAuth Credentials

If you're logged into Claude Code, the script automatically reads ~/.claude/.credentials.json:

python security_tests.py

Expected Output

[PASS] TEST 1: Token Counting API is server-side
[PASS] TEST 2: Server returns rate limit headers and usage data
[PASS] TEST 3: Server ignores/rejects client-injected token counts
[PASS] TEST 4: Token count estimate vs actual usage
[PASS] TEST 5: Server enforces rate limits via 429 response
[PASS] TEST 6: Large request reaches server without client-side blocking
[PASS] TEST 7: Server validates request parameters (not client)

Note: Tests 2, 4, 5, 6 may return 429 if you're rate-limited. This is expected and actually proves the point — the server is enforcing limits, not the client.


Source Code Evidence

All code references are from Claude Code v2.1.89 (anthropic.claude-code-2.1.89-win32-x64/extension.js).

1. countTokens() — Server API Call, Not Local

countTokens(K, V) {
  return this._client.post("/v1/messages/count_tokens", { body: K, ...V });
}

2. Usage Data — Populated From Server Response

case "message_delta":
  H.usage.output_tokens = V.usage.output_tokens;           // FROM SERVER
  if (V.usage.input_tokens != null)
    H.usage.input_tokens = V.usage.input_tokens;            // FROM SERVER
  if (V.usage.cache_creation_input_tokens != null)
    H.usage.cache_creation_input_tokens = V.usage.cache_creation_input_tokens;
  if (V.usage.cache_read_input_tokens != null)
    H.usage.cache_read_input_tokens = V.usage.cache_read_input_tokens;

3. 429 Handling — Reactive, Not Pre-Emptive

if (K === 429) return new Cj(K, j, H, B);  // Creates RateLimitError
// ...
async retryRequest(K, V, H, B) {
  let j, G = B?.get("retry-after-ms");    // Read from SERVER header
  let N = B?.get("retry-after");            // Read from SERVER header
}

4. Session State — Display Counters Only

{
  totalCostUSD: 0,           // Display only — never gates requests
  modelUsage: {},            // Display only — never gates requests  
  tokenCounter: null,        // OpenTelemetry metrics — never gates requests
  costCounter: null,         // OpenTelemetry metrics — never gates requests
}

The Real (Non-Security) Issue

There is a legitimate usability concern (not a vulnerability):

The Token Counting API (/v1/messages/count_tokens) returns worst-case estimates assuming 0% cache hits. But server-side ITPM enforcement excludes cached tokens for most models. This means:

  • Cost estimates can be inflated by up to 90% when heavy caching is in effect
  • A user seeing "200K tokens" might actually only consume 20K against their rate limit

This is documented in anthropics/claude-code#18726 (Closed: Not Planned).

Why this isn't a vulnerability: The mismatch is conservative — it overestimates, not underestimates. An attacker wanting to bypass rate limits would need the opposite: a way to make the server undercount tokens. The Token Counting API can't do that because the server counts independently.


Responsible Disclosure

This concern was reported through HackerOne's Vulnerability Disclosure Program before any testing was performed.

Detail Value
Platform HackerOne
Report ID #3642470
Researcher Idrissa Maiga
Status Submitted and under review by Anthropic's security team
Severity Assessed No exploitable vulnerability (see analysis above)

Disclosure timeline:

  1. 2026-03-31 — Claude Code source became publicly accessible via npm distribution
  2. 2026-03-31 — Static code analysis performed (no systems tested, no exploitation attempted)
  3. 2026-04-01 — Report #3642470 submitted to Anthropic via HackerOne VDP
  4. 2026-04-01 — Live API tests conducted using researcher's own authenticated account
  5. 2026-04-01 — Full analysis and test results published to this repository

The analysis was conducted entirely through:

  1. Static code analysis of publicly distributed Claude Code bundles
  2. Reviewing official Anthropic API documentation
  3. Standard API calls using the researcher's own authenticated credentials
  4. No systems were tested without authorization
  5. No exploitation was attempted

References

Source Link
HackerOne Report #3642470 hackerone.com/reports/3642470
Anthropic VDP (HackerOne) hackerone.com/anthropic
Anthropic Rate Limits platform.claude.com/docs/en/api/rate-limits
Token Counting API platform.claude.com/docs/en/build-with-claude/token-counting
Rate Limits Support Article support.claude.com/en/articles/8243635
GitHub Issue #18726 anthropics/claude-code#18726
Token Bucket Algorithm Wikipedia

Author

Idrissa Maiga
Software Developer | Security Researcher
GitHub | HackerOne | LinkedIn


License

This research is published for educational and security research purposes. All testing was performed on the researcher's own account with proper authorization.


Analysis conducted with assistance from Claude Opus 4.6.

About

Security research on Claude Code rate limiting -- responsible disclosure via HackerOne, 7/7 tests passing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages