Skip to content

Fix Claude Code connectivity with Rapid-MLX server#557

Open
angusgastle wants to merge 2 commits into
raullenchai:mainfrom
angusgastle:refactor/claude-code
Open

Fix Claude Code connectivity with Rapid-MLX server#557
angusgastle wants to merge 2 commits into
raullenchai:mainfrom
angusgastle:refactor/claude-code

Conversation

@angusgastle

Copy link
Copy Markdown

Summary

Fixes two 404 errors that prevent Claude Code from connecting to the local Rapid-MLX inference server:

  1. HEAD / → 404 — Claude Code's connectivity probe fails
  2. POST /v1/messages → 404 — Model name mismatch on Anthropic API endpoint

Changes

vllm_mlx/routes/health.py

  • Added @probe_router.api_route("/", methods=["GET", "HEAD"]) handler
  • Returns {"status": "ok"}
  • Lives on probe_router (no-auth) so probe works with --api-key

vllm_mlx/routes/anthropic.py

  • Removed _validate_model_name(anthropic_request.model) call
  • Removed _validate_model_name from import list
  • get_engine(anthropic_request.model) now handles model routing with proper fallback

Tests (6 new tests)

  • tests/test_routes.py: 3 root path tests (GET /, HEAD /, HEAD / + api-key)
  • tests/test_anthropic_route_auth.py: 2 Claude model name tests
  • tests/test_api_validation_bundle.py: 1 regression test for OpenAI validation

Test Results

All 6 new tests passing

  • Root path returns 200 for both GET and HEAD
  • Claude model names (claude-opus-4-5, etc.) accepted on /v1/messages
  • OpenAI endpoints still strictly validate model names (unchanged behavior)

How to Test Locally

# Unit tests
pytest tests/test_routes.py::TestHealthRoutes -k root -v
pytest tests/test_anthropic_route_auth.py -k claude -v

# Manual integration (with server running on http://localhost:8000)
curl -I http://localhost:8000/
# Expected: HTTP/1.1 200 OK

curl -X POST http://localhost:8000/v1/messages \
  -H "Content-Type: application/json" \
  -d '{"model":"claude-opus-4-5","max_tokens":10,"messages":[{"role":"user","content":"hi"}]}'
# Expected: 200 with message response (not 404)

# Verify OpenAI route unchanged
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"wrong-model","messages":[{"role":"user","content":"hi"}]}'
# Expected: 404 (unchanged)

Context

When Claude Code is configured with ANTHROPIC_BASE_URL=http://localhost:8000, it:

  1. Sends HEAD / as a connectivity probe → was getting 404
  2. Sends requests with model names like "claude-opus-4-5" → was getting 404 from strict validation

These fixes allow Claude Code to use Rapid-MLX as a local backend while maintaining strict model validation on OpenAI-compatible endpoints.

Address two 404 errors that prevent Claude Code from connecting:

1. HEAD / → 404: Add root path handler to health.py
   - Claude Code sends HEAD / as a connectivity probe before API calls
   - Root handler on probe_router (no auth) so it works with --api-key
   - FastAPI auto-generates HEAD / from GET / registration

2. POST /v1/messages → 404: Remove _validate_model_name from Anthropic route
   - Claude Code sends real model names (claude-opus-4-5, etc.)
   - _validate_model_name rejected non-MLX model names
   - get_engine() already handles model routing with proper fallback
   - OpenAI endpoints still validate model names correctly

Changes:
- vllm_mlx/routes/health.py: Add @probe_router.get("/") handler
- vllm_mlx/routes/anthropic.py: Remove _validate_model_name call & import
- tests/test_routes.py: Add 3 root path handler tests
- tests/test_anthropic_route_auth.py: Add 2 claude model name tests
- tests/test_api_validation_bundle.py: Add 1 OpenAI regression test

Fixes connection issues while maintaining strict model validation on
OpenAI-compatible endpoints and test coverage across all code paths.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

@raullenchai raullenchai left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch — both bugs are real, the use case (Claude Code as a local Anthropic SDK client against /v1/messages) is a great fit for what that route was added for, and the supply-chain shape is clean (no new deps, no CI/install changes). Two blockers below before this can land, plus two nits.

Blockers

1. Tab indentation in vllm_mlx/routes/health.py

The new root() function uses tabs for indentation, while the rest of the file (and the codebase) is 4-space. od -c on the raw diff confirms \t in the docstring/body lines.

Our .pre-commit-config.yaml runs ruff-format, and pyproject.toml pins [tool.black] line-length = 88 / [tool.ruff] line-length = 88 — so both the pre-commit hook and CI will reject this file. Could you re-indent with 4 spaces and re-push?

2. Removing _validate_model_name regresses multi-model mode

Looking at vllm_mlx/service/helpers.py::get_engine:

def get_engine(model_name: str | None = None) -> BaseEngine:
    cfg = get_config()
    if cfg.model_registry:
        try:
            return cfg.model_registry.get_engine(model_name)
        except KeyError:
            pass                          # ← silent fall-through
    if cfg.engine is None:
        raise HTTPException(status_code=503, detail="Model not loaded")
    return cfg.engine

In multi-model mode (cfg.model_registry populated, cfg.engine = None), a request like "claude-opus-4-5":

  • Before this PR: _validate_model_name raises 404 "The model claude-opus-4-5 does not exist. Available: qwen3.6-27b, deepseek-v4-flash, …" — clear diagnostic, lists what's loaded.
  • After this PR: registry lookup fails with KeyError → falls through → cfg.engine is None503 "Model not loaded" — misleading (the model registry is loaded; only the requested name isn't in it).

So the single-model case improves (your stated goal), but the multi-model case regresses from a clear 404 to a misleading 503. Single-model mode is the more common deployment for now, but multi-model is the direction we're heading and we don't want to ship a known regression.

Suggested fix (either is fine, your call):

  • (A) Keep _validate_model_name, but bypass it for Anthropic-shaped names — e.g. allow claude-* / gpt-* prefixes through to get_engine while keeping strict validation for everything else. Zero new schema, least invasive.
  • (B) Add a model_aliases config (e.g. --anthropic-model-alias claude-opus-4-5=qwen3.6-27b) so the passthrough is explicit and the server log shows the mapping.

I'd lean (A) for this PR.

Nits

3. Misleading docstring on test_head_root_returns_200

The test docstring says:

"HEAD / is the Claude Code connectivity probe. FastAPI auto-generates it from GET / — this test pins the contract …"

But the implementation is @probe_router.api_route("/", methods=["GET", "HEAD"]) — HEAD is registered explicitly, not auto-generated. The comment contradicts the line right above it; please rewrite to describe what's actually pinned (HEAD stays on probe_router, bypasses verify_api_key).

4. Silent name swap is a footgun — at least log it

After this PR, a client typo like "claude-opus-99-5" returns 200 OK against whatever's loaded, with response.model = "qwen3.5-4b-8bit" (the loaded name). The client has no signal that the requested name was different from what served.

At minimum, please add an INFO log in create_anthropic_message when the request model differs from the loaded model:

if anthropic_request.model != cfg.model_name:
    logger.info(
        "Anthropic /v1/messages: request model=%r served by loaded engine=%r",
        anthropic_request.model, cfg.model_name,
    )

That makes the substitution visible in /logs for debugging without changing the response shape.

Summary

  • Step 0 (does this solve a real product problem): ✅ Yes — /v1/messages exists for Anthropic-SDK compat and Claude Code is the canonical client.
  • Supply-chain audit: ✅ Clean — no deps, no CI, no install hooks, no network calls.
  • Tests: the 6 new tests are well-scoped; the OpenAI-route-still-404s regression test in particular is exactly the right shape.
  • Action: please address Blockers 1 + 2, and ideally Nits 3 + 4, then I'll re-review.

Thanks again for digging into this!

- Fix tab indentation in health.py root() to use 4 spaces (blocker)
- Restore _validate_model_name in anthropic route with bypass for Anthropic
  model names (claude-*, gpt-*) to preserve single-model passthrough while
  fixing multi-model 404 diagnostic regression (blocker)
- Clarify test_head_root_returns_200 docstring: HEAD is explicitly
  registered on probe_router, not auto-generated (nit)
- Add INFO log when Anthropic request model differs from loaded engine,
  making silent model substitution visible in logs for debugging (nit)

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
@angusgastle

Copy link
Copy Markdown
Author

Addressed all review feedback in commit 4220bdf:

Blockers:

  1. ✅ Fixed tab indentation in health.py::root() — now uses 4-space indent consistent with the rest of the codebase
  2. ✅ Restored _validate_model_name validation in anthropic route with bypass for Anthropic-shaped names (claude-*, gpt-*). This preserves the single-model passthrough for Claude Code while fixing the multi-model 404 diagnostic regression — non-Anthropic names are still validated and get clear error messages listing available models.

Nits:
3. ✅ Clarified test_head_root_returns_200 docstring to accurately reflect that HEAD is explicitly registered on probe_router, not auto-generated by FastAPI
4. ✅ Added INFO log when Anthropic request model differs from loaded engine model, making silent substitution visible in logs for debugging

Ready for re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants