fix: use hashed token as rate-limit client_id instead of raw header by Oxygen56 · Pull Request #560 · raullenchai/Rapid-MLX

Oxygen56 · 2026-06-12T10:51:48Z

Summary

Rate limiter previously used the raw Authorization header as client_id
Fixed by deriving a unique client identifier via HMAC-SHA256 of the token
Added proper IPv4 /24 and IPv6 /64 subnet grouping for unauthenticated traffic
This provides forward-compat for multi-tenant tokens, raw-secret hygiene, and NAT-friendly IP grouping

Fixes #192

Changes

_bucket_id(): HMAC-SHA256 with per-process random key for token hashing
_subnet_bucket(): ipaddress-based subnet masking (IPv4 /24, IPv6 /64)
3 new tests pinning the fix behavior

Test Plan

test_rate_limit_distinguishes_clients_by_hashed_bearer — two tokens → different buckets, raw token not in bucket key
test_rate_limit_groups_unauth_clients_by_subnet — same /24 → same bucket
test_rate_limit_same_token_via_bearer_and_x_api_key_share_bucket — Bearer and x-api-key parity

Fixes raullenchai#192 Previously all authenticated users shared a single rate-limit bucket because the raw Authorization header was used as client_id. Now a SHA256 hash of the token ensures per-user rate limiting. Also masks unauthenticated client IPs to /24 subnets to avoid grouping all NAT-traversal users into a single bucket.

raullenchai

Thanks for picking up #192. The supply-chain shape is clean (added hashlib only — stdlib), and the existing rate-limit tests at tests/test_anthropic_route_auth.py:286-355 still pass because both paths now hash the same input to the same digest. Two blockers and two nits before this can land.

Blockers

1. No tests for the new behavior

Issue #192 ships with a clean curl reproduction (5 successful requests under user1, 6th request from user2 triggers a 429 even with a distinct identity). The fix should pin that reproduction so the regression can't silently come back. The PR's "Manual verification" checklist isn't enforceable in CI.

Concretely I'd want:

# tests/test_config_and_middleware.py — add to the rate-limit section

def test_rate_limit_distinguishes_clients_by_hashed_bearer():
    """Two distinct bearer tokens get separate buckets.
    Pins #192: raw header values used to conflate everyone into one bucket."""
    from vllm_mlx.middleware.auth import _rate_limit_client_id
    # Build two requests with different bearer tokens, assert
    # _rate_limit_client_id() returns different values AND that
    # neither value contains the raw token (proves hashing).
    ...

def test_rate_limit_groups_unauth_clients_by_subnet():
    """IPv4 clients in the same /24 share a bucket."""
    # Hit _rate_limit_client_id with two requests from
    # 192.0.2.10 and 192.0.2.99 (no Authorization) — assert same client_id.
    ...

def test_rate_limit_same_token_via_bearer_and_x_api_key_share_bucket():
    """Already pinned for the /v1/messages path at test_anthropic_route_auth.py:312.
    Add the same assertion for _rate_limit_client_id() in isolation so the
    invariant isn't only enforced via the high-level Anthropic test."""
    ...

Three small tests; should fit in ~40 LOC.

2. `/24` subnet mask via `rsplit(".", 1)[0]` silently no-ops on IPv6

if request.client and request.client.host:
    return request.client.host.rsplit(".", 1)[0]

For an IPv6 client (e.g. 2001:db8::1), rsplit(".", 1)[0] finds no . and returns the whole address unchanged. Result: IPv6 hosts get per-/128 buckets, not the intended NAT-friendly grouping. Worse for IPv4-mapped IPv6 like ::ffff:192.0.2.1 — you get "::ffff:192.0.2", which is structurally weird and groups a different prefix than the IPv4-path comment implies.

Fix with ipaddress from stdlib:

import ipaddress

if request.client and request.client.host:
    try:
        addr = ipaddress.ip_address(request.client.host)
        if isinstance(addr, ipaddress.IPv4Address):
            network = ipaddress.ip_network(f"{addr}/24", strict=False)
        else:
            network = ipaddress.ip_network(f"{addr}/64", strict=False)
        return str(network.network_address)
    except ValueError:
        return request.client.host  # malformed — bucket as-is
return "unknown"

/24 for IPv4 (matches issue spec); /64 for IPv6 is the conventional ISP-allocation boundary and analogous in coarseness. Move the helper to module scope so it isn't reconstructed per request.

Nits

3. PR description overstates what this fixes

The summary says:

"causing all authenticated users to share a single rate-limit bucket when API key auth is enabled" → "This ensures per-user (per-token) rate limiting as intended"

In rapid-mlx's current --api-key X mode, every authenticated client uses the same shared X. Hashing X produces the same digest for everyone, so they all still share one bucket — the "per-user rate limiting as intended" claim doesn't materialize in the typical deployment. The real wins of this PR are:

Token hygiene — raw secret no longer used as in-memory dict key (small but real)
Forward compat — when we add multi-tenant tokens later, the limiter is already shaped right
/24 grouping for unauth traffic — helps NAT'd users (once the IPv6 bug above is fixed)

Could you tone the PR description down to match? "Forward-compat for multi-tenant tokens + raw-secret hygiene + NAT-friendly IP grouping" is more accurate than "ensures per-user rate limiting."

4. Unsalted SHA256 of the token

Not a strict blocker (issue #192's recommended fix is also unsalted), but for defense-in-depth in a future multi-tenant world, an HMAC keyed on a per-process random salt would mean rate-limiter internals leaking the digest doesn't help an attacker confirm token guesses:

_RATE_LIMIT_HMAC_KEY = secrets.token_bytes(32)  # module-level, regenerated per process

def _bucket_for(raw: str) -> str:
    return hmac.new(_RATE_LIMIT_HMAC_KEY, raw.encode(), hashlib.sha256).hexdigest()[:16]

Worth doing but I won't block on it.

Summary

Step 0 (does this solve a real product problem): ⚠ Partial. Token hygiene is universally good and the structure is right for future multi-tenant support; the headline "per-user rate limiting" claim doesn't apply to current single-tenant --api-key deployments. Worth landing once Blockers fixed.
Supply-chain audit: ✅ Clean — single stdlib import (hashlib).
Tests: ❌ None added — Blocker 1 above.
Action: please address Blockers 1 + 2, retitle the PR summary per Nit 3, and I'll re-review.

Thanks again — this is one of the older HIGH-severity issues and great to see it picked up.

- Add 3 tests: client distinction, /24 subnet grouping, Bearer/x-api-key parity - Use ipaddress stdlib for proper IPv4 /24 and IPv6 /64 subnet masking - Switch to HMAC-SHA256 with per-process random key for token hashing - Update PR description to accurately reflect scope Refs raullenchai#192

raullenchai requested changes Jun 12, 2026

View reviewed changes

Oxygen56 mentioned this pull request Jun 12, 2026

docs: add AI client compatibility guide #561

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use hashed token as rate-limit client_id instead of raw header#560

fix: use hashed token as rate-limit client_id instead of raw header#560
Oxygen56 wants to merge 2 commits into
raullenchai:mainfrom
Oxygen56:fix/issue-192-rate-limiter-client-id

Oxygen56 commented Jun 12, 2026 •

edited

Loading

Uh oh!

raullenchai left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Oxygen56 commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test Plan

Uh oh!

raullenchai left a comment

Choose a reason for hiding this comment

Blockers

1. No tests for the new behavior

2. /24 subnet mask via rsplit(".", 1)[0] silently no-ops on IPv6

Nits

3. PR description overstates what this fixes

4. Unsalted SHA256 of the token

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Oxygen56 commented Jun 12, 2026 •

edited

Loading

2. `/24` subnet mask via `rsplit(".", 1)[0]` silently no-ops on IPv6