fix(server): clamp CoreRequest.MaxTokens to per-route max_output_tokens by thezzisu · Pull Request #57 · ZhiYi-R/moon-bridge

thezzisu · 2026-05-24T14:59:55Z

Problem

When an inbound OpenAI Responses request omits max_output_tokens (or sets a value above the upstream model's actual ceiling), moonbridge falls back to defaults.max_tokens from the global config and forwards that value verbatim to the upstream protocol adapters. Each protocol adapter (anthropic, chat, openai, google) currently has its own defaultMaxTokens helper that only consults the inbound request and the global default — none of them clamp to the per-route models.<slug>.max_output_tokens declared in config.

In a deployment that routes Claude (max 64K out), Qwen DashScope (32K / 65K), Gemini (65K) and DeepSeek V4 Pro (320K) through a single moonbridge instance, the global default has to be sized for the largest cap (DeepSeek), which then makes every smaller-cap upstream return 400 Invalid request / 400 Range of max_tokens should be [1, 65536] whenever the client doesn't supply an explicit cap.

Real production trace excerpt (anonymised):

level=ERROR msg=提供商错误 model=claude-sonnet-4-6 status=400 ... req_max_tokens=320000
level=ERROR msg=提供商错误 model=qwen3.6-plus    status=400 ... req_max_tokens=320000
    error="<400> Range of max_tokens should be [1, 65536]"

Fix

Single-point clamp in internal/service/server/adapter_dispatch.go::handleWithAdapters, applied right after client.ToCoreRequest and the upstream model alias resolution and before providerAdapter.FromCoreRequest. The clamp uses a new helper (*Server).routeMaxOutputTokens(modelAlias, preferred) that prefers config.Routes[alias].MaxOutputTokens and falls back to config.ProviderDefs[providerKey].Models[upstreamModel].MaxOutputTokens. Returns 0 (no clamp) when neither is set, preserving prior behavior for configs that don't declare per-model caps.

Because this lives at the protocol-agnostic format.CoreRequest boundary, all four protocol adapters (anthropic, chat, openai, google) inherit the clamp without changes.

Tests

Adds three tests in internal/service/server/adapter_dispatch_test.go:

TestRouteMaxOutputTokensPrefersRouteEntry — route-level cap wins over provider-meta.
TestRouteMaxOutputTokensFallsBackToProviderModelMeta — provider catalog metadata is consulted when the route doesn't declare its own.
TestRouteMaxOutputTokensReturnsZeroWhenUnset — both unset → 0 (no clamp).

Full go test ./... passes locally on golang:1.26-bookworm.

Backwards compatibility

Pure additive behavior — installations that don't declare max_output_tokens per model see no change. Installations that do declare it now get an automatic clamp instead of relying on the upstream HTTP response.

Operational note

This patch was deployed in production at the PKU CCLab moonbridge instance for ~10 minutes before this PR was opened, where it eliminated the 400 cycle and verified normal operation across Claude / Qwen / DeepSeek / Gemini routes.

When the inbound OpenAI Responses request omits max_output_tokens (or sets it above the upstream limit), moonbridge previously injected the global defaults.max_tokens unchanged. Anthropic / Qwen / Gemini upstreams reject oversized values with 400. Resolve a per-alias cap from config.Routes[<alias>].MaxOutputTokens, with fallback to provider catalog ModelMeta.MaxOutputTokens, and clamp coreReq.MaxTokens before the protocol adapter serializes the upstream request. Adds three unit tests covering route, fallback, and unset paths.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(server): clamp CoreRequest.MaxTokens to per-route max_output_tokens#57

fix(server): clamp CoreRequest.MaxTokens to per-route max_output_tokens#57
thezzisu wants to merge 1 commit into
ZhiYi-R:mainfrom
thezzisu:fix/clamp-coreReq-max-tokens-to-route

thezzisu commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thezzisu commented May 24, 2026

Problem

Fix

Tests

Backwards compatibility

Operational note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant