API_Router

A DeepSeek API proxy server that automatically injects thinking mode (thinking) and reasoning effort (reasoning_effort) parameters into requests.

Simply point your requests to http://localhost:8000 — the proxy handles thinking parameter injection. api_key and max_tokens are provided by the user in the request.

Why This Exists

Some apps or SDKs may not have caught up with new upstream API parameters (for example, enabling thinking mode may require injecting thinking / reasoning_effort). This project acts as a transparent proxy so those apps (e.g. TRAE) can enable thinking mode without changing their request logic.

Features

Transparently proxies all requests to https://api.deepseek.com
Auto-injects thinking and reasoning_effort (supports OpenAI / Anthropic formats)
Short model aliases (pro → deepseek-v4-pro, flash → deepseek-v4-flash)
Automatic reasoning_content caching and restoration for multi-turn conversations
Supports both streaming and non-streaming responses
Configuration via config.yaml, no code changes needed

Quick Start

1. Create a Virtual Environment

cd API_Router
python3 -m venv venv
source venv/bin/activate  //Linux / macOS
.\venv\Scripts\Activate  //Windows

deactivate  //Exit venv

2. Install Dependencies

pip install -r requirements.txt

3. Configuration (Optional)

Edit config.yaml to adjust settings:

server:
  host: "0.0.0.0"
  port: 8000

target:
  base_url: "https://api.deepseek.com"

thinking:
  enabled: true          # Enable thinking mode
  format: "openai"       # Parameter format: "openai" or "anthropic"
  effort: "max"          # Reasoning effort: "low" / "medium" / "high" / "max"

model_mapping:
  flash: "deepseek-v4-flash"
  pro: "deepseek-v4-pro"

log:
  level: "INFO"          # Log level: DEBUG / INFO / WARNING / ERROR
  log_input: false       # Pretty-print request body as JSON
  log_output: false      # Pretty-print response body as JSON

4. Start the Server

python proxy_server.py

Startup output:

==================================================
DeepSeek API Proxy starting on 0.0.0.0:8000
Upstream: https://api.deepseek.com
Thinking mode: ON
  Format: openai
  Effort: max
Log input: OFF
Log output: OFF
==================================================

5. Usage

Point your requests to http://localhost:8000. Always include your own Authorization: Bearer sk-xxx header.

curl Example

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-key" \
  -d '{
    "model": "pro",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": false,
    "max_tokens": 1000
  }'

OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-key",
    base_url="http://localhost:8000/v1"
)

response = client.chat.completions.create(
    model="pro",
    messages=[{"role": "user", "content": "Write a poem"}],
    max_tokens=1000
)

print(response.choices[0].message.content)

Streaming

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-key",
    base_url="http://localhost:8000/v1"
)

stream = client.chat.completions.create(
    model="pro",
    messages=[{"role": "user", "content": "Tell me a joke"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

What the Proxy Does

The proxy automatically injects the following parameters before forwarding requests (if not already present):

OpenAI format (thinking.format: "openai"):

{
  "thinking": {"type": "enabled"},
  "reasoning_effort": "max"
}

Anthropic format (thinking.format: "anthropic"):

{
  "thinking": {"type": "enabled"},
  "output_config": {"effort": "max"}
}

If these parameters already exist in the request, the proxy will not overwrite them.

Multi-turn Conversations

For thinking mode, DeepSeek API requires every role: assistant message in the conversation history to carry its reasoning_content. The proxy automatically caches the upstream reasoning content and restores it in subsequent requests.

Project Structure

API_Router/
├── config.yaml
├── proxy_server.py
├── requirements.txt
├── README.md
└── README_EN.md

FAQ

Q: Where do I configure my api_key?

No configuration needed. Simply include the Authorization: Bearer sk-xxx header in your request as you normally would — the proxy passes it through.

Q: Where do I set max_tokens?

By default, max_tokens is specified by the user in the request and passed through. If you set output_length.max_tokens in config.yaml, the proxy will override max_tokens before forwarding (and also overwrite existing max_completion_tokens / max_output_tokens).

Q: How do I disable thinking mode?

Set thinking.enabled to false in config.yaml.

Q: What reasoning effort values are available?

low / medium / high / max.

Q: How do I view full request/response content?

Set log.log_input and log.log_output to true in config.yaml. Logs will be pretty-printed as formatted JSON.

Q: What model aliases are supported?

pro → deepseek-v4-pro, flash → deepseek-v4-flash. You can use pro or flash directly as the model name.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API_Router

Why This Exists

Features

Quick Start

1. Create a Virtual Environment

2. Install Dependencies

3. Configuration (Optional)

4. Start the Server

5. Usage

curl Example

OpenAI Python SDK

Streaming

What the Proxy Does

Multi-turn Conversations

Project Structure

FAQ

Q: Where do I configure my api_key?

Q: Where do I set max_tokens?

Q: How do I disable thinking mode?

Q: What reasoning effort values are available?

Q: How do I view full request/response content?

Q: What model aliases are supported?

FilesExpand file tree

README_EN.md

Latest commit

History

README_EN.md

File metadata and controls

API_Router

Why This Exists

Features

Quick Start

1. Create a Virtual Environment

2. Install Dependencies

3. Configuration (Optional)

4. Start the Server

5. Usage

curl Example

OpenAI Python SDK

Streaming

What the Proxy Does

Multi-turn Conversations

Project Structure

FAQ

Q: Where do I configure my api_key?

Q: Where do I set max_tokens?

Q: How do I disable thinking mode?

Q: What reasoning effort values are available?

Q: How do I view full request/response content?

Q: What model aliases are supported?