Custom AI SDK provider for using nexos.ai models with opencode.
Fixes compatibility issues when using Gemini, Claude, ChatGPT, Codex, Mistral, and Codestral models through nexos.ai API in opencode.
Architecture: Hybrid provider — Claude models use native Anthropic /v1/messages via @ai-sdk/anthropic; all other models use OpenAI-compatible /v1/chat/completions via @ai-sdk/openai-compatible with per-model fixes.
- Prompt caching — automatic caching with top-level
cache_control; caches entire conversation history automatically as it grows - System prompt normalization — converts AI SDK content-part arrays to plain strings (required by vertex-ai)
- No OpenAI-compat fixes needed —
end_turn→stop,budgetTokens→budget_tokens, temperature handling all work natively
- Gemini: appends missing
data: [DONE]SSE signal (prevents hanging), inlines$refin tool schemas (rejected by Vertex AI), fixesfinish_reasonfor tool calls (stop→tool_calls) - ChatGPT/GPT: strips
reasoning_effort: "none"only for legacy / non-reasoning models (GPT 4.x,Chat,Instant,oss— modern GPT 5.x accept"none"natively), stripstemperature: false(invalid value), strips temperature for non-Codex models (nexos.ai chat completions only supports default temperature; Codex models via Responses API support custom temperature) - Codex: transparently redirects requests to
/v1/responses(Responses API) — Codex models don't support/v1/chat/completions. Handles streaming, tool calls, reasoning effort, and cache token reporting. - Mistral / Codestral: sets
strict: falsein tool definitions whenstrictisnull(Mistral API rejectsnullfor this field). Applies to all models whose name containsmistralorcodestral. - Kimi / GLM: synthesizes missing
data: [DONE]andusagechunks in streaming responses viaTransformStream.
export NEXOS_API_KEY="your-nexos-api-key"Add the provider to your ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"provider": {
"nexos-ai": {
"npm": "@crazy-goat/nexos-provider",
"name": "Nexos AI",
"env": ["NEXOS_API_KEY"],
"options": {
"baseURL": "https://api.nexos.ai/v1/",
"timeout": 300000
},
"models": {
"Gemini 2.5 Pro": {
"name": "Gemini 2.5 Pro",
"limit": { "context": 128000, "output": 64000 }
},
"Claude Sonnet 4.5": {
"name": "Claude Sonnet 4.5",
"limit": { "context": 200000, "output": 16000 },
"options": {
"thinking": { "type": "enabled", "budgetTokens": 1024 }
},
"variants": {
"thinking-high": { "thinking": { "type": "enabled", "budgetTokens": 10000 } },
"no-thinking": { "thinking": { "type": "disabled" } }
}
},
"GPT 5": {
"name": "GPT 5",
"limit": { "context": 400000, "output": 128000 },
"options": { "reasoningEffort": "medium" },
"variants": {
"high": { "reasoningEffort": "high" },
"no-reasoning": { "reasoningEffort": "none" }
}
}
}
}
}
}Tip: You can automatically generate the config with all available nexos.ai models using opencode-nexos-models-config.
Warning: Gemini 3 models (Flash Preview, Pro Preview) do not work with tool calling through nexos.ai — see known-bugs/gemini3-tools for details.
Simple prompt:
opencode run "hello" -m "nexos-ai/Gemini 2.5 Pro"With tool calling:
opencode run "list files in current directory" -m "nexos-ai/Gemini 2.5 Pro"Claude with thinking:
opencode run "what is 2+2?" -m "nexos-ai/Claude Sonnet 4.5" --variant thinking-highGPT with reasoning effort:
opencode run "what is 2+2?" -m "nexos-ai/GPT 5" --variant highOr select the model interactively in opencode with Ctrl+X M.
opencode caches the provider in ~/.cache/opencode/. To force an update to the latest version:
rm -rf ~/.cache/opencode/node_modules/@crazy-goatThe next time you run opencode, it will download the latest version from npm.
The provider exports createNexosAI which routes Claude models to the native Anthropic SDK and all other models to the OpenAI-compatible SDK with custom fetch wrappers:
opencode → createNexosAI → router
│
├─ Claude models → @ai-sdk/anthropic → /v1/messages
│ └─ createAnthropicFetch(): system array→string,
│ auto cache_control for prompts >3000 chars
│
└─ Other models → @ai-sdk/openai-compatible → /v1/chat/completions
│
├─ fix-gemini.mjs: $ref inlining, finish_reason fix
├─ fix-chatgpt.mjs: strips reasoning_effort:"none" for legacy models
├─ fix-codex.mjs: chat completions → Responses API
├─ fix-mistral.mjs: strict:null→false in tools (Mistral + Codestral)
└─ fix-kimi.mjs: synthesizes [DONE] + usage for fireworks-ai stream
Test with a simple prompt:
opencode run "what is 2+2?" -m "nexos-ai/Gemini 2.5 Pro"
opencode run "what is 2+2?" -m "nexos-ai/Gemini 2.5 Flash"
opencode run "what is 2+2?" -m "nexos-ai/Claude Sonnet 4.5"
opencode run "what is 2+2?" -m "nexos-ai/GPT 5"Test tool calling:
opencode run "list files in current directory" -m "nexos-ai/Gemini 2.5 Pro"
opencode run "list files in current directory" -m "nexos-ai/Claude Sonnet 4.5"
opencode run "list files in current directory" -m "nexos-ai/GPT 5"
opencode run "list files in current directory" -m "nexos-ai/GPT 5.3 Codex"Test thinking/reasoning variants:
opencode run "what is 2+2?" -m "nexos-ai/Claude Sonnet 4.5" --variant thinking-high
opencode run "what is 2+2?" -m "nexos-ai/Gemini 2.5 Pro" --variant thinking-high
opencode run "what is 2+2?" -m "nexos-ai/GPT 5" --variant high
opencode run "what is 2+2?" -m "nexos-ai/GPT 5.3 Codex" --variant highRun check-models/check-all.mjs to test all available models for simple prompts and tool calling:
node check-models/check-all.mjsTest a single model:
node check-models/check-all.mjs "GPT 4.1"Results are saved to check-models/checks.md — see current compatibility status there.
The known-bugs/ directory documents every API quirk the provider works around, one folder per issue. Each folder has a README and, where empirical reproduction adds value, a test script.
- claude-prompt-caching —
cache_controlmarker strategy (4 breakpoints: system, tools, latest user, previous user) + break-even math and real-session savings. - claude-finish-reason-end-turn — In thinking mode, Claude leaks
end_turn(natural end) andtool_use(tool call end) where opencode expectsstop/tool_calls. Without the rewrites, opencode retries indefinitely on every thinking-mode turn. - claude-thinking-params —
budgetTokens→budget_tokens(snake_case), bumpmax_tokenswhen budget exceeds it, striptemperaturewhile thinking is enabled. (Historical:thinking: {type: "disabled"}stripping — upstream now accepts it, fix is a pass-through.) - claude-opus-47-temperature — Opus 4.7 with any
temperatureroutes to a guardrails backend where streaming tool calls are broken. Provider stripstemperaturefor Opus 4.7. - claude-sonnet-46-cache — Sonnet 4.6 on vertex-ai invalidates cache when
cache_controlis on user messages; also a higher minimum token threshold than documented. - claude-cached-tokens-reporting — Opus models only report cache via
prompt_tokens_details.cached_tokens; provider sums it intoprompt_tokensfor opencode's usage display.
- gemini-schema-restrictions — Vertex AI rejects many JSON Schema keywords (
$ref,exclusiveMinimum,patternProperties,if/then/else,not,$schema, etc.). Provider inlines refs and strips the rest. - gemini-stream-format — Four stream-format issues bundled: missing
[DONE]sentinel, uppercaseSTOP,stopinstead oftool_callsfor tool use,content_blocks[].delta.thinkinginstead ofreasoning_content. - gemini3-tools — Gemini 3 / 3.1 reject multi-turn tool-use replays because nexos.ai does not propagate
thought_signature. Provider rewrites history into plain alternating turns.
- gpt-chat-completions-limits — Legacy / non-reasoning GPT models (GPT 4.x,
Chat,Instant,oss) rejectreasoning_effort: "none"; modern GPT 5.x accept it. Plustemperature: falseand customtemperatureare rejected for all non-Codex GPT models. - codex-responses-api — Codex models require
/v1/responses, not/v1/chat/completions. Provider redirects the URL and converts both directions (request schema + SSE stream + usage).
- codestral-strict-null — Mistral API rejects
strict: nullin tool function definitions. Provider coercesnull→false. Applies to bothcodestral-*andMistral *models.
- kimi-fireworks-stream — Kimi and GLM on fireworks-ai stream without
data: [DONE]orusagechunk. Provider'sTransformStreamsynthesizes both on flush while preserving progressive streaming.
- token-caching — Prefix caching matrix across Gemini / Claude / GPT. Gemini implicit caching only matches identical requests (no prefix match); explicit
cachedContentsAPI is not exposed by nexos.ai. - thinking — Test harness for thinking / reasoning token reporting across models.
MIT