Fix Claude Code connectivity with Rapid-MLX server#557
Conversation
Address two 404 errors that prevent Claude Code from connecting:
1. HEAD / → 404: Add root path handler to health.py
- Claude Code sends HEAD / as a connectivity probe before API calls
- Root handler on probe_router (no auth) so it works with --api-key
- FastAPI auto-generates HEAD / from GET / registration
2. POST /v1/messages → 404: Remove _validate_model_name from Anthropic route
- Claude Code sends real model names (claude-opus-4-5, etc.)
- _validate_model_name rejected non-MLX model names
- get_engine() already handles model routing with proper fallback
- OpenAI endpoints still validate model names correctly
Changes:
- vllm_mlx/routes/health.py: Add @probe_router.get("/") handler
- vllm_mlx/routes/anthropic.py: Remove _validate_model_name call & import
- tests/test_routes.py: Add 3 root path handler tests
- tests/test_anthropic_route_auth.py: Add 2 claude model name tests
- tests/test_api_validation_bundle.py: Add 1 OpenAI regression test
Fixes connection issues while maintaining strict model validation on
OpenAI-compatible endpoints and test coverage across all code paths.
Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Thanks for the patch — both bugs are real, the use case (Claude Code as a local Anthropic SDK client against /v1/messages) is a great fit for what that route was added for, and the supply-chain shape is clean (no new deps, no CI/install changes). Two blockers below before this can land, plus two nits.
Blockers
1. Tab indentation in vllm_mlx/routes/health.py
The new root() function uses tabs for indentation, while the rest of the file (and the codebase) is 4-space. od -c on the raw diff confirms \t in the docstring/body lines.
Our .pre-commit-config.yaml runs ruff-format, and pyproject.toml pins [tool.black] line-length = 88 / [tool.ruff] line-length = 88 — so both the pre-commit hook and CI will reject this file. Could you re-indent with 4 spaces and re-push?
2. Removing _validate_model_name regresses multi-model mode
Looking at vllm_mlx/service/helpers.py::get_engine:
def get_engine(model_name: str | None = None) -> BaseEngine:
cfg = get_config()
if cfg.model_registry:
try:
return cfg.model_registry.get_engine(model_name)
except KeyError:
pass # ← silent fall-through
if cfg.engine is None:
raise HTTPException(status_code=503, detail="Model not loaded")
return cfg.engineIn multi-model mode (cfg.model_registry populated, cfg.engine = None), a request like "claude-opus-4-5":
- Before this PR:
_validate_model_nameraises404 "The modelclaude-opus-4-5does not exist. Available: qwen3.6-27b, deepseek-v4-flash, …"— clear diagnostic, lists what's loaded. - After this PR: registry lookup fails with KeyError → falls through →
cfg.engine is None→ 503 "Model not loaded" — misleading (the model registry is loaded; only the requested name isn't in it).
So the single-model case improves (your stated goal), but the multi-model case regresses from a clear 404 to a misleading 503. Single-model mode is the more common deployment for now, but multi-model is the direction we're heading and we don't want to ship a known regression.
Suggested fix (either is fine, your call):
- (A) Keep
_validate_model_name, but bypass it for Anthropic-shaped names — e.g. allowclaude-*/gpt-*prefixes through toget_enginewhile keeping strict validation for everything else. Zero new schema, least invasive. - (B) Add a
model_aliasesconfig (e.g.--anthropic-model-alias claude-opus-4-5=qwen3.6-27b) so the passthrough is explicit and the server log shows the mapping.
I'd lean (A) for this PR.
Nits
3. Misleading docstring on test_head_root_returns_200
The test docstring says:
"HEAD / is the Claude Code connectivity probe. FastAPI auto-generates it from GET / — this test pins the contract …"
But the implementation is @probe_router.api_route("/", methods=["GET", "HEAD"]) — HEAD is registered explicitly, not auto-generated. The comment contradicts the line right above it; please rewrite to describe what's actually pinned (HEAD stays on probe_router, bypasses verify_api_key).
4. Silent name swap is a footgun — at least log it
After this PR, a client typo like "claude-opus-99-5" returns 200 OK against whatever's loaded, with response.model = "qwen3.5-4b-8bit" (the loaded name). The client has no signal that the requested name was different from what served.
At minimum, please add an INFO log in create_anthropic_message when the request model differs from the loaded model:
if anthropic_request.model != cfg.model_name:
logger.info(
"Anthropic /v1/messages: request model=%r served by loaded engine=%r",
anthropic_request.model, cfg.model_name,
)That makes the substitution visible in /logs for debugging without changing the response shape.
Summary
- Step 0 (does this solve a real product problem): ✅ Yes —
/v1/messagesexists for Anthropic-SDK compat and Claude Code is the canonical client. - Supply-chain audit: ✅ Clean — no deps, no CI, no install hooks, no network calls.
- Tests: the 6 new tests are well-scoped; the OpenAI-route-still-404s regression test in particular is exactly the right shape.
- Action: please address Blockers 1 + 2, and ideally Nits 3 + 4, then I'll re-review.
Thanks again for digging into this!
- Fix tab indentation in health.py root() to use 4 spaces (blocker) - Restore _validate_model_name in anthropic route with bypass for Anthropic model names (claude-*, gpt-*) to preserve single-model passthrough while fixing multi-model 404 diagnostic regression (blocker) - Clarify test_head_root_returns_200 docstring: HEAD is explicitly registered on probe_router, not auto-generated (nit) - Add INFO log when Anthropic request model differs from loaded engine, making silent model substitution visible in logs for debugging (nit) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
|
Addressed all review feedback in commit 4220bdf: Blockers:
Nits: Ready for re-review. |
Summary
Fixes two 404 errors that prevent Claude Code from connecting to the local Rapid-MLX inference server:
Changes
vllm_mlx/routes/health.py
@probe_router.api_route("/", methods=["GET", "HEAD"])handler{"status": "ok"}probe_router(no-auth) so probe works with--api-keyvllm_mlx/routes/anthropic.py
_validate_model_name(anthropic_request.model)call_validate_model_namefrom import listget_engine(anthropic_request.model)now handles model routing with proper fallbackTests (6 new tests)
tests/test_routes.py: 3 root path tests (GET /, HEAD /, HEAD / + api-key)tests/test_anthropic_route_auth.py: 2 Claude model name teststests/test_api_validation_bundle.py: 1 regression test for OpenAI validationTest Results
✅ All 6 new tests passing
How to Test Locally
Context
When Claude Code is configured with
ANTHROPIC_BASE_URL=http://localhost:8000, it:HEAD /as a connectivity probe → was getting 404"claude-opus-4-5"→ was getting 404 from strict validationThese fixes allow Claude Code to use Rapid-MLX as a local backend while maintaining strict model validation on OpenAI-compatible endpoints.