fix: make max_tokens configurable (closes #29)#78
fix: make max_tokens configurable (closes #29)#78lefarcen wants to merge 5 commits intonexu-io:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50a9d145e5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
lefarcen
left a comment
There was a problem hiding this comment.
Hi @lefarcen! 👋 Nice fix for issue #29 — making max_tokens configurable is the right approach for users with Anthropic-compatible APIs that support higher limits.
Overall
✅ Correctness: Logic is sound. Fallback to 8192 preserves existing behavior.
✅ Security: No new attack surface (input validation covers the range).
✅ UI/UX: Clear hint text, sensible min/max/step.
✅ i18n: Both EN and ZH-CN translations present.
Lens A findings (code correctness)
Two P3 suggestions inline — both non-blocking, but would improve robustness.
|
Merge gate review summary: Status: GO Why:
Required checks before merge:
Risk:
Rollback:
|
|
Thanks for the structured review summary, @roberthgnz! 👍 Agreed on GO status — the two P3 suggestions I flagged earlier are nice-to-haves, not blockers. The core fix is sound and addresses the real truncation issue. @lefarcen This is ready to merge from my side. If you want to address the P3 input validation suggestions (clamping in JS + guarding against negative values), feel free, but not required. |
BYOK users on custom Anthropic-compatible providers (e.g. Xiaomi MiMo) hit the hardcoded 8192 cap and saw artifacts truncated mid-stream. - AppConfig.maxTokens with Settings input (EN/CN + 8 other locales) - ProxyStreamRequest.maxTokens contract field - anthropic, anthropic-compatible, and openai-compatible providers all forward cfg.maxTokens - /api/proxy/anthropic/stream and /api/proxy/stream payloads honor it, defaulting to 8192 when unset so prior clients are unaffected Original sketch by @mashu in nexu-io#78 (50a9d14); rebased to the apps/web layout and extended to the proxy paths actually used when baseUrl is set, which is where nexu-io#29's user actually traffics.
50a9d14 to
6a3ae5f
Compare
Adds a hand-maintained MODEL_MAX_TOKENS table (Claude 4.5 line → 64k, mimo-v2.5-pro → 32k) and an effectiveMaxTokens helper layered over the override field added in 6a3ae5f, so nexu-io#29's user — and others on supported models — don't have to discover Settings to avoid mid-stream truncation. - apps/web/src/state/maxTokens.ts: lookup + helpers - providers/{anthropic,anthropic-compatible,openai-compatible}.ts: forward effectiveMaxTokens(cfg) instead of cfg.maxTokens ?? 8192 - SettingsDialog: input becomes an optional override (blank = default, shown as placeholder) - 10 locale hint strings updated to the new semantics
Replaces the 4-entry hand-rolled MODEL_MAX_TOKENS map from 544e67e with a vendored slice of BerriAI/litellm's model_prices_and_context_window JSON (1970 chat models, ~97KB raw / ~25KB gzip). Future model launches land in maxTokens.ts via `pnpm sync-litellm-models` instead of manual edits. - scripts/sync-litellm-models.ts: fetches the upstream JSON, filters to chat-mode entries, projects each entry to its max_output_tokens (or max_tokens fallback), and writes a sorted, license-attributed JSON - apps/web/src/state/litellm-models.json: generated artifact, committed - apps/web/src/state/maxTokens.ts: lookup is now OVERRIDES → LITELLM_MODELS → FALLBACK_MAX_TOKENS. The OVERRIDES table shrinks to just `mimo-v2.5-pro` (LiteLLM only ships MiMo via OpenRouter/Novita aliases, not the canonical id Xiaomi's API uses). LiteLLM is MIT-licensed (BerriAI/litellm/blob/main/LICENSE); attribution is preserved in both the script header and the generated JSON's _license field.
- apps/web/src/state/maxTokens.test.ts: six vitest cases pinning the three-tier lookup (override → LiteLLM → fallback) and the effectiveMaxTokens user-override path. Guards against a future sync silently dropping the Anthropic 4.5 entries we rely on. - CONTRIBUTING.md / CONTRIBUTING.zh-CN.md: new "Updating model max_tokens metadata" section pointing future maintainers at scripts/sync-litellm-models.ts and explaining when OVERRIDES is appropriate (it's the rare exception, not the default).
The Settings field is optional (blank means "use the per-model default") but the label gave no visual cue, breaking the implicit pattern that every other API-mode field (key/model/baseUrl) is required. Append "(optional)" — using the locale's natural parenthetical convention (Chinese full-width brackets, Japanese 任意, Russian опционально, etc.) — so the field reads as discretionary at a glance.
Summary
Fixes issue where users with Anthropic-compatible APIs (e.g., Xiaomi MiMo) hit the hardcoded 8192 token limit, causing design tasks requiring large artifacts (15-30k tokens) to be truncated mid-generation.
Problem
User @majiabin2020 reported (#29) that when using Anthropic API + Xiaomi MiMo v2.5:
stop_reason: max_tokens)Root cause:
src/providers/anthropic.ts:44hardcodedmax_tokens: 8192, but design tasks often require 15-30k tokens for complete artifacts.Solution
Make
max_tokensconfigurable via Settings:AppConfig.maxTokens?: numberfield (defaults to 8192)anthropic.tsto usecfg.maxTokens || 8192Users can now configure
maxTokensto match their provider's limits (e.g., 32768+ for large design tasks).Testing
maxTokensstill defaults to 8192maxTokens: 32768in Settings allows larger artifactsCloses
Closes #29
cc @PerishCode @majiabin2020