Skip to content

fix: use hashed token as rate-limit client_id instead of raw header#560

Open
Oxygen56 wants to merge 2 commits into
raullenchai:mainfrom
Oxygen56:fix/issue-192-rate-limiter-client-id
Open

fix: use hashed token as rate-limit client_id instead of raw header#560
Oxygen56 wants to merge 2 commits into
raullenchai:mainfrom
Oxygen56:fix/issue-192-rate-limiter-client-id

Conversation

@Oxygen56

@Oxygen56 Oxygen56 commented Jun 12, 2026

Copy link
Copy Markdown

Summary

  • Rate limiter previously used the raw Authorization header as client_id
  • Fixed by deriving a unique client identifier via HMAC-SHA256 of the token
  • Added proper IPv4 /24 and IPv6 /64 subnet grouping for unauthenticated traffic
  • This provides forward-compat for multi-tenant tokens, raw-secret hygiene, and NAT-friendly IP grouping

Fixes #192

Changes

  • _bucket_id(): HMAC-SHA256 with per-process random key for token hashing
  • _subnet_bucket(): ipaddress-based subnet masking (IPv4 /24, IPv6 /64)
  • 3 new tests pinning the fix behavior

Test Plan

  • test_rate_limit_distinguishes_clients_by_hashed_bearer — two tokens → different buckets, raw token not in bucket key
  • test_rate_limit_groups_unauth_clients_by_subnet — same /24 → same bucket
  • test_rate_limit_same_token_via_bearer_and_x_api_key_share_bucket — Bearer and x-api-key parity

Fixes raullenchai#192

Previously all authenticated users shared a single rate-limit bucket
because the raw Authorization header was used as client_id. Now a
SHA256 hash of the token ensures per-user rate limiting.

Also masks unauthenticated client IPs to /24 subnets to avoid
grouping all NAT-traversal users into a single bucket.

@raullenchai raullenchai left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking up #192. The supply-chain shape is clean (added hashlib only — stdlib), and the existing rate-limit tests at tests/test_anthropic_route_auth.py:286-355 still pass because both paths now hash the same input to the same digest. Two blockers and two nits before this can land.

Blockers

1. No tests for the new behavior

Issue #192 ships with a clean curl reproduction (5 successful requests under user1, 6th request from user2 triggers a 429 even with a distinct identity). The fix should pin that reproduction so the regression can't silently come back. The PR's "Manual verification" checklist isn't enforceable in CI.

Concretely I'd want:

# tests/test_config_and_middleware.py — add to the rate-limit section

def test_rate_limit_distinguishes_clients_by_hashed_bearer():
    """Two distinct bearer tokens get separate buckets.
    Pins #192: raw header values used to conflate everyone into one bucket."""
    from vllm_mlx.middleware.auth import _rate_limit_client_id
    # Build two requests with different bearer tokens, assert
    # _rate_limit_client_id() returns different values AND that
    # neither value contains the raw token (proves hashing).
    ...

def test_rate_limit_groups_unauth_clients_by_subnet():
    """IPv4 clients in the same /24 share a bucket."""
    # Hit _rate_limit_client_id with two requests from
    # 192.0.2.10 and 192.0.2.99 (no Authorization) — assert same client_id.
    ...

def test_rate_limit_same_token_via_bearer_and_x_api_key_share_bucket():
    """Already pinned for the /v1/messages path at test_anthropic_route_auth.py:312.
    Add the same assertion for _rate_limit_client_id() in isolation so the
    invariant isn't only enforced via the high-level Anthropic test."""
    ...

Three small tests; should fit in ~40 LOC.

2. /24 subnet mask via rsplit(".", 1)[0] silently no-ops on IPv6

if request.client and request.client.host:
    return request.client.host.rsplit(".", 1)[0]

For an IPv6 client (e.g. 2001:db8::1), rsplit(".", 1)[0] finds no . and returns the whole address unchanged. Result: IPv6 hosts get per-/128 buckets, not the intended NAT-friendly grouping. Worse for IPv4-mapped IPv6 like ::ffff:192.0.2.1 — you get "::ffff:192.0.2", which is structurally weird and groups a different prefix than the IPv4-path comment implies.

Fix with ipaddress from stdlib:

import ipaddress

if request.client and request.client.host:
    try:
        addr = ipaddress.ip_address(request.client.host)
        if isinstance(addr, ipaddress.IPv4Address):
            network = ipaddress.ip_network(f"{addr}/24", strict=False)
        else:
            network = ipaddress.ip_network(f"{addr}/64", strict=False)
        return str(network.network_address)
    except ValueError:
        return request.client.host  # malformed — bucket as-is
return "unknown"

/24 for IPv4 (matches issue spec); /64 for IPv6 is the conventional ISP-allocation boundary and analogous in coarseness. Move the helper to module scope so it isn't reconstructed per request.

Nits

3. PR description overstates what this fixes

The summary says:

"causing all authenticated users to share a single rate-limit bucket when API key auth is enabled" → "This ensures per-user (per-token) rate limiting as intended"

In rapid-mlx's current --api-key X mode, every authenticated client uses the same shared X. Hashing X produces the same digest for everyone, so they all still share one bucket — the "per-user rate limiting as intended" claim doesn't materialize in the typical deployment. The real wins of this PR are:

  • Token hygiene — raw secret no longer used as in-memory dict key (small but real)
  • Forward compat — when we add multi-tenant tokens later, the limiter is already shaped right
  • /24 grouping for unauth traffic — helps NAT'd users (once the IPv6 bug above is fixed)

Could you tone the PR description down to match? "Forward-compat for multi-tenant tokens + raw-secret hygiene + NAT-friendly IP grouping" is more accurate than "ensures per-user rate limiting."

4. Unsalted SHA256 of the token

Not a strict blocker (issue #192's recommended fix is also unsalted), but for defense-in-depth in a future multi-tenant world, an HMAC keyed on a per-process random salt would mean rate-limiter internals leaking the digest doesn't help an attacker confirm token guesses:

_RATE_LIMIT_HMAC_KEY = secrets.token_bytes(32)  # module-level, regenerated per process

def _bucket_for(raw: str) -> str:
    return hmac.new(_RATE_LIMIT_HMAC_KEY, raw.encode(), hashlib.sha256).hexdigest()[:16]

Worth doing but I won't block on it.

Summary

  • Step 0 (does this solve a real product problem): ⚠ Partial. Token hygiene is universally good and the structure is right for future multi-tenant support; the headline "per-user rate limiting" claim doesn't apply to current single-tenant --api-key deployments. Worth landing once Blockers fixed.
  • Supply-chain audit: ✅ Clean — single stdlib import (hashlib).
  • Tests: ❌ None added — Blocker 1 above.
  • Action: please address Blockers 1 + 2, retitle the PR summary per Nit 3, and I'll re-review.

Thanks again — this is one of the older HIGH-severity issues and great to see it picked up.

- Add 3 tests: client distinction, /24 subnet grouping, Bearer/x-api-key parity
- Use ipaddress stdlib for proper IPv4 /24 and IPv6 /64 subnet masking
- Switch to HMAC-SHA256 with per-process random key for token hashing
- Update PR description to accurately reflect scope

Refs raullenchai#192
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

HIGH: Rate limiter conflates all authenticated users into a single bucket via Authorization header

2 participants