A DeepSeek API proxy server that automatically injects thinking mode (thinking) and reasoning effort (reasoning_effort) parameters into requests.
Simply point your requests to http://localhost:8000 — the proxy handles thinking parameter injection. api_key and max_tokens are provided by the user in the request.
Some apps or SDKs may not have caught up with new upstream API parameters (for example, enabling thinking mode may require injecting thinking / reasoning_effort). This project acts as a transparent proxy so those apps (e.g. TRAE) can enable thinking mode without changing their request logic.
- Transparently proxies all requests to
https://api.deepseek.com - Auto-injects
thinkingandreasoning_effort(supports OpenAI / Anthropic formats) - Short model aliases (
pro→deepseek-v4-pro,flash→deepseek-v4-flash) - Automatic
reasoning_contentcaching and restoration for multi-turn conversations - Supports both streaming and non-streaming responses
- Configuration via
config.yaml, no code changes needed
cd API_Router
python3 -m venv venv
source venv/bin/activate //Linux / macOS
.\venv\Scripts\Activate //Windows
deactivate //Exit venvpip install -r requirements.txtEdit config.yaml to adjust settings:
server:
host: "0.0.0.0"
port: 8000
target:
base_url: "https://api.deepseek.com"
thinking:
enabled: true # Enable thinking mode
format: "openai" # Parameter format: "openai" or "anthropic"
effort: "max" # Reasoning effort: "low" / "medium" / "high" / "max"
model_mapping:
flash: "deepseek-v4-flash"
pro: "deepseek-v4-pro"
log:
level: "INFO" # Log level: DEBUG / INFO / WARNING / ERROR
log_input: false # Pretty-print request body as JSON
log_output: false # Pretty-print response body as JSONpython proxy_server.pyStartup output:
==================================================
DeepSeek API Proxy starting on 0.0.0.0:8000
Upstream: https://api.deepseek.com
Thinking mode: ON
Format: openai
Effort: max
Log input: OFF
Log output: OFF
==================================================
Point your requests to http://localhost:8000. Always include your own Authorization: Bearer sk-xxx header.
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-your-key" \
-d '{
"model": "pro",
"messages": [{"role": "user", "content": "Hello"}],
"stream": false,
"max_tokens": 1000
}'from openai import OpenAI
client = OpenAI(
api_key="sk-your-key",
base_url="http://localhost:8000/v1"
)
response = client.chat.completions.create(
model="pro",
messages=[{"role": "user", "content": "Write a poem"}],
max_tokens=1000
)
print(response.choices[0].message.content)from openai import OpenAI
client = OpenAI(
api_key="sk-your-key",
base_url="http://localhost:8000/v1"
)
stream = client.chat.completions.create(
model="pro",
messages=[{"role": "user", "content": "Tell me a joke"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")The proxy automatically injects the following parameters before forwarding requests (if not already present):
OpenAI format (thinking.format: "openai"):
{
"thinking": {"type": "enabled"},
"reasoning_effort": "max"
}Anthropic format (thinking.format: "anthropic"):
{
"thinking": {"type": "enabled"},
"output_config": {"effort": "max"}
}If these parameters already exist in the request, the proxy will not overwrite them.
For thinking mode, DeepSeek API requires every role: assistant message in the conversation history to carry its reasoning_content. The proxy automatically caches the upstream reasoning content and restores it in subsequent requests.
API_Router/
├── config.yaml
├── proxy_server.py
├── requirements.txt
├── README.md
└── README_EN.md
No configuration needed. Simply include the Authorization: Bearer sk-xxx header in your request as you normally would — the proxy passes it through.
By default, max_tokens is specified by the user in the request and passed through. If you set output_length.max_tokens in config.yaml, the proxy will override max_tokens before forwarding (and also overwrite existing max_completion_tokens / max_output_tokens).
Set thinking.enabled to false in config.yaml.
low / medium / high / max.
Set log.log_input and log.log_output to true in config.yaml. Logs will be pretty-printed as formatted JSON.
pro → deepseek-v4-pro, flash → deepseek-v4-flash. You can use pro or flash directly as the model name.