Fix Claude Code connectivity with Rapid-MLX server by angusgastle · Pull Request #557 · raullenchai/Rapid-MLX

angusgastle · 2026-06-12T02:14:57Z

Summary

Fixes two 404 errors that prevent Claude Code from connecting to the local Rapid-MLX inference server:

HEAD / → 404 — Claude Code's connectivity probe fails
POST /v1/messages → 404 — Model name mismatch on Anthropic API endpoint

Changes

vllm_mlx/routes/health.py

Added @probe_router.api_route("/", methods=["GET", "HEAD"]) handler
Returns {"status": "ok"}
Lives on probe_router (no-auth) so probe works with --api-key

vllm_mlx/routes/anthropic.py

Removed _validate_model_name(anthropic_request.model) call
Removed _validate_model_name from import list
get_engine(anthropic_request.model) now handles model routing with proper fallback

Tests (6 new tests)

tests/test_routes.py: 3 root path tests (GET /, HEAD /, HEAD / + api-key)
tests/test_anthropic_route_auth.py: 2 Claude model name tests
tests/test_api_validation_bundle.py: 1 regression test for OpenAI validation

Test Results

✅ All 6 new tests passing

Root path returns 200 for both GET and HEAD
Claude model names (claude-opus-4-5, etc.) accepted on /v1/messages
OpenAI endpoints still strictly validate model names (unchanged behavior)

How to Test Locally

# Unit tests
pytest tests/test_routes.py::TestHealthRoutes -k root -v
pytest tests/test_anthropic_route_auth.py -k claude -v

# Manual integration (with server running on http://localhost:8000)
curl -I http://localhost:8000/
# Expected: HTTP/1.1 200 OK

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-opus-4-5","max_tokens":10,"messages":[{"role":"user","content":"hi"}]}'
# Expected: 200 with message response (not 404)

# Verify OpenAI route unchanged
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"wrong-model","messages":[{"role":"user","content":"hi"}]}'
# Expected: 404 (unchanged)

Context

When Claude Code is configured with ANTHROPIC_BASE_URL=http://localhost:8000, it:

Sends HEAD / as a connectivity probe → was getting 404
Sends requests with model names like "claude-opus-4-5" → was getting 404 from strict validation

These fixes allow Claude Code to use Rapid-MLX as a local backend while maintaining strict model validation on OpenAI-compatible endpoints.

Address two 404 errors that prevent Claude Code from connecting: 1. HEAD / → 404: Add root path handler to health.py - Claude Code sends HEAD / as a connectivity probe before API calls - Root handler on probe_router (no auth) so it works with --api-key - FastAPI auto-generates HEAD / from GET / registration 2. POST /v1/messages → 404: Remove _validate_model_name from Anthropic route - Claude Code sends real model names (claude-opus-4-5, etc.) - _validate_model_name rejected non-MLX model names - get_engine() already handles model routing with proper fallback - OpenAI endpoints still validate model names correctly Changes: - vllm_mlx/routes/health.py: Add @probe_router.get("/") handler - vllm_mlx/routes/anthropic.py: Remove _validate_model_name call & import - tests/test_routes.py: Add 3 root path handler tests - tests/test_anthropic_route_auth.py: Add 2 claude model name tests - tests/test_api_validation_bundle.py: Add 1 OpenAI regression test Fixes connection issues while maintaining strict model validation on OpenAI-compatible endpoints and test coverage across all code paths. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

raullenchai

Thanks for the patch — both bugs are real, the use case (Claude Code as a local Anthropic SDK client against /v1/messages) is a great fit for what that route was added for, and the supply-chain shape is clean (no new deps, no CI/install changes). Two blockers below before this can land, plus two nits.

Blockers

1. Tab indentation in `vllm_mlx/routes/health.py`

The new root() function uses tabs for indentation, while the rest of the file (and the codebase) is 4-space. od -c on the raw diff confirms \t in the docstring/body lines.

Our .pre-commit-config.yaml runs ruff-format, and pyproject.toml pins [tool.black] line-length = 88 / [tool.ruff] line-length = 88 — so both the pre-commit hook and CI will reject this file. Could you re-indent with 4 spaces and re-push?

2. Removing `_validate_model_name` regresses multi-model mode

Looking at vllm_mlx/service/helpers.py::get_engine:

def get_engine(model_name: str | None = None) -> BaseEngine:
    cfg = get_config()
    if cfg.model_registry:
        try:
            return cfg.model_registry.get_engine(model_name)
        except KeyError:
            pass                          # ← silent fall-through
    if cfg.engine is None:
        raise HTTPException(status_code=503, detail="Model not loaded")
    return cfg.engine

In multi-model mode (cfg.model_registry populated, cfg.engine = None), a request like "claude-opus-4-5":

Before this PR: _validate_model_name raises 404 "The model claude-opus-4-5 does not exist. Available: qwen3.6-27b, deepseek-v4-flash, …" — clear diagnostic, lists what's loaded.
After this PR: registry lookup fails with KeyError → falls through → cfg.engine is None → 503 "Model not loaded" — misleading (the model registry is loaded; only the requested name isn't in it).

So the single-model case improves (your stated goal), but the multi-model case regresses from a clear 404 to a misleading 503. Single-model mode is the more common deployment for now, but multi-model is the direction we're heading and we don't want to ship a known regression.

Suggested fix (either is fine, your call):

(A) Keep _validate_model_name, but bypass it for Anthropic-shaped names — e.g. allow claude-* / gpt-* prefixes through to get_engine while keeping strict validation for everything else. Zero new schema, least invasive.
(B) Add a model_aliases config (e.g. --anthropic-model-alias claude-opus-4-5=qwen3.6-27b) so the passthrough is explicit and the server log shows the mapping.

I'd lean (A) for this PR.

Nits

3. Misleading docstring on `test_head_root_returns_200`

The test docstring says:

"HEAD / is the Claude Code connectivity probe. FastAPI auto-generates it from GET / — this test pins the contract …"

But the implementation is @probe_router.api_route("/", methods=["GET", "HEAD"]) — HEAD is registered explicitly, not auto-generated. The comment contradicts the line right above it; please rewrite to describe what's actually pinned (HEAD stays on probe_router, bypasses verify_api_key).

4. Silent name swap is a footgun — at least log it

After this PR, a client typo like "claude-opus-99-5" returns 200 OK against whatever's loaded, with response.model = "qwen3.5-4b-8bit" (the loaded name). The client has no signal that the requested name was different from what served.

At minimum, please add an INFO log in create_anthropic_message when the request model differs from the loaded model:

if anthropic_request.model != cfg.model_name:
    logger.info(
        "Anthropic /v1/messages: request model=%r served by loaded engine=%r",
        anthropic_request.model, cfg.model_name,
    )

That makes the substitution visible in /logs for debugging without changing the response shape.

Summary

Step 0 (does this solve a real product problem): ✅ Yes — /v1/messages exists for Anthropic-SDK compat and Claude Code is the canonical client.
Supply-chain audit: ✅ Clean — no deps, no CI, no install hooks, no network calls.
Tests: the 6 new tests are well-scoped; the OpenAI-route-still-404s regression test in particular is exactly the right shape.
Action: please address Blockers 1 + 2, and ideally Nits 3 + 4, then I'll re-review.

Thanks again for digging into this!

- Fix tab indentation in health.py root() to use 4 spaces (blocker) - Restore _validate_model_name in anthropic route with bypass for Anthropic model names (claude-*, gpt-*) to preserve single-model passthrough while fixing multi-model 404 diagnostic regression (blocker) - Clarify test_head_root_returns_200 docstring: HEAD is explicitly registered on probe_router, not auto-generated (nit) - Add INFO log when Anthropic request model differs from loaded engine, making silent model substitution visible in logs for debugging (nit) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

angusgastle · 2026-06-12T16:12:06Z

Addressed all review feedback in commit 4220bdf:

Blockers:

✅ Fixed tab indentation in health.py::root() — now uses 4-space indent consistent with the rest of the codebase
✅ Restored _validate_model_name validation in anthropic route with bypass for Anthropic-shaped names (claude-*, gpt-*). This preserves the single-model passthrough for Claude Code while fixing the multi-model 404 diagnostic regression — non-Anthropic names are still validated and get clear error messages listing available models.

Nits:
3. ✅ Clarified test_head_root_returns_200 docstring to accurately reflect that HEAD is explicitly registered on probe_router, not auto-generated by FastAPI
4. ✅ Added INFO log when Anthropic request model differs from loaded engine model, making silent substitution visible in logs for debugging

Ready for re-review.

raullenchai requested changes Jun 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Claude Code connectivity with Rapid-MLX server#557

Fix Claude Code connectivity with Rapid-MLX server#557
angusgastle wants to merge 2 commits into
raullenchai:mainfrom
angusgastle:refactor/claude-code

angusgastle commented Jun 12, 2026

Uh oh!

raullenchai left a comment •

edited

Loading

Uh oh!

angusgastle commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

angusgastle commented Jun 12, 2026

Summary

Changes

vllm_mlx/routes/health.py

vllm_mlx/routes/anthropic.py

Tests (6 new tests)

Test Results

How to Test Locally

Context

Uh oh!

raullenchai left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Blockers

1. Tab indentation in vllm_mlx/routes/health.py

2. Removing _validate_model_name regresses multi-model mode

Nits

3. Misleading docstring on test_head_root_returns_200

4. Silent name swap is a footgun — at least log it

Summary

Uh oh!

angusgastle commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

raullenchai left a comment •

edited

Loading

1. Tab indentation in `vllm_mlx/routes/health.py`

2. Removing `_validate_model_name` regresses multi-model mode

3. Misleading docstring on `test_head_root_returns_200`