fix: use hashed token as rate-limit client_id instead of raw header#560
fix: use hashed token as rate-limit client_id instead of raw header#560Oxygen56 wants to merge 2 commits into
Conversation
Fixes raullenchai#192 Previously all authenticated users shared a single rate-limit bucket because the raw Authorization header was used as client_id. Now a SHA256 hash of the token ensures per-user rate limiting. Also masks unauthenticated client IPs to /24 subnets to avoid grouping all NAT-traversal users into a single bucket.
raullenchai
left a comment
There was a problem hiding this comment.
Thanks for picking up #192. The supply-chain shape is clean (added hashlib only — stdlib), and the existing rate-limit tests at tests/test_anthropic_route_auth.py:286-355 still pass because both paths now hash the same input to the same digest. Two blockers and two nits before this can land.
Blockers
1. No tests for the new behavior
Issue #192 ships with a clean curl reproduction (5 successful requests under user1, 6th request from user2 triggers a 429 even with a distinct identity). The fix should pin that reproduction so the regression can't silently come back. The PR's "Manual verification" checklist isn't enforceable in CI.
Concretely I'd want:
# tests/test_config_and_middleware.py — add to the rate-limit section
def test_rate_limit_distinguishes_clients_by_hashed_bearer():
"""Two distinct bearer tokens get separate buckets.
Pins #192: raw header values used to conflate everyone into one bucket."""
from vllm_mlx.middleware.auth import _rate_limit_client_id
# Build two requests with different bearer tokens, assert
# _rate_limit_client_id() returns different values AND that
# neither value contains the raw token (proves hashing).
...
def test_rate_limit_groups_unauth_clients_by_subnet():
"""IPv4 clients in the same /24 share a bucket."""
# Hit _rate_limit_client_id with two requests from
# 192.0.2.10 and 192.0.2.99 (no Authorization) — assert same client_id.
...
def test_rate_limit_same_token_via_bearer_and_x_api_key_share_bucket():
"""Already pinned for the /v1/messages path at test_anthropic_route_auth.py:312.
Add the same assertion for _rate_limit_client_id() in isolation so the
invariant isn't only enforced via the high-level Anthropic test."""
...Three small tests; should fit in ~40 LOC.
2. /24 subnet mask via rsplit(".", 1)[0] silently no-ops on IPv6
if request.client and request.client.host:
return request.client.host.rsplit(".", 1)[0]For an IPv6 client (e.g. 2001:db8::1), rsplit(".", 1)[0] finds no . and returns the whole address unchanged. Result: IPv6 hosts get per-/128 buckets, not the intended NAT-friendly grouping. Worse for IPv4-mapped IPv6 like ::ffff:192.0.2.1 — you get "::ffff:192.0.2", which is structurally weird and groups a different prefix than the IPv4-path comment implies.
Fix with ipaddress from stdlib:
import ipaddress
if request.client and request.client.host:
try:
addr = ipaddress.ip_address(request.client.host)
if isinstance(addr, ipaddress.IPv4Address):
network = ipaddress.ip_network(f"{addr}/24", strict=False)
else:
network = ipaddress.ip_network(f"{addr}/64", strict=False)
return str(network.network_address)
except ValueError:
return request.client.host # malformed — bucket as-is
return "unknown"/24 for IPv4 (matches issue spec); /64 for IPv6 is the conventional ISP-allocation boundary and analogous in coarseness. Move the helper to module scope so it isn't reconstructed per request.
Nits
3. PR description overstates what this fixes
The summary says:
"causing all authenticated users to share a single rate-limit bucket when API key auth is enabled" → "This ensures per-user (per-token) rate limiting as intended"
In rapid-mlx's current --api-key X mode, every authenticated client uses the same shared X. Hashing X produces the same digest for everyone, so they all still share one bucket — the "per-user rate limiting as intended" claim doesn't materialize in the typical deployment. The real wins of this PR are:
- Token hygiene — raw secret no longer used as in-memory dict key (small but real)
- Forward compat — when we add multi-tenant tokens later, the limiter is already shaped right
- /24 grouping for unauth traffic — helps NAT'd users (once the IPv6 bug above is fixed)
Could you tone the PR description down to match? "Forward-compat for multi-tenant tokens + raw-secret hygiene + NAT-friendly IP grouping" is more accurate than "ensures per-user rate limiting."
4. Unsalted SHA256 of the token
Not a strict blocker (issue #192's recommended fix is also unsalted), but for defense-in-depth in a future multi-tenant world, an HMAC keyed on a per-process random salt would mean rate-limiter internals leaking the digest doesn't help an attacker confirm token guesses:
_RATE_LIMIT_HMAC_KEY = secrets.token_bytes(32) # module-level, regenerated per process
def _bucket_for(raw: str) -> str:
return hmac.new(_RATE_LIMIT_HMAC_KEY, raw.encode(), hashlib.sha256).hexdigest()[:16]Worth doing but I won't block on it.
Summary
- Step 0 (does this solve a real product problem): ⚠ Partial. Token hygiene is universally good and the structure is right for future multi-tenant support; the headline "per-user rate limiting" claim doesn't apply to current single-tenant
--api-keydeployments. Worth landing once Blockers fixed. - Supply-chain audit: ✅ Clean — single stdlib import (
hashlib). - Tests: ❌ None added — Blocker 1 above.
- Action: please address Blockers 1 + 2, retitle the PR summary per Nit 3, and I'll re-review.
Thanks again — this is one of the older HIGH-severity issues and great to see it picked up.
- Add 3 tests: client distinction, /24 subnet grouping, Bearer/x-api-key parity - Use ipaddress stdlib for proper IPv4 /24 and IPv6 /64 subnet masking - Switch to HMAC-SHA256 with per-process random key for token hashing - Update PR description to accurately reflect scope Refs raullenchai#192
Summary
Fixes #192
Changes
_bucket_id(): HMAC-SHA256 with per-process random key for token hashing_subnet_bucket(): ipaddress-based subnet masking (IPv4 /24, IPv6 /64)Test Plan
test_rate_limit_distinguishes_clients_by_hashed_bearer— two tokens → different buckets, raw token not in bucket keytest_rate_limit_groups_unauth_clients_by_subnet— same /24 → same buckettest_rate_limit_same_token_via_bearer_and_x_api_key_share_bucket— Bearer and x-api-key parity