Skip to content

Latest commit

 

History

History
207 lines (146 loc) · 5.37 KB

File metadata and controls

207 lines (146 loc) · 5.37 KB

API_Router

A DeepSeek API proxy server that automatically injects thinking mode (thinking) and reasoning effort (reasoning_effort) parameters into requests.

Simply point your requests to http://localhost:8000 — the proxy handles thinking parameter injection. api_key and max_tokens are provided by the user in the request.

Why This Exists

Some apps or SDKs may not have caught up with new upstream API parameters (for example, enabling thinking mode may require injecting thinking / reasoning_effort). This project acts as a transparent proxy so those apps (e.g. TRAE) can enable thinking mode without changing their request logic.

Features

  • Transparently proxies all requests to https://api.deepseek.com
  • Auto-injects thinking and reasoning_effort (supports OpenAI / Anthropic formats)
  • Short model aliases (prodeepseek-v4-pro, flashdeepseek-v4-flash)
  • Automatic reasoning_content caching and restoration for multi-turn conversations
  • Supports both streaming and non-streaming responses
  • Configuration via config.yaml, no code changes needed

Quick Start

1. Create a Virtual Environment

cd API_Router
python3 -m venv venv
source venv/bin/activate  //Linux / macOS
.\venv\Scripts\Activate  //Windows

deactivate  //Exit venv

2. Install Dependencies

pip install -r requirements.txt

3. Configuration (Optional)

Edit config.yaml to adjust settings:

server:
  host: "0.0.0.0"
  port: 8000

target:
  base_url: "https://api.deepseek.com"

thinking:
  enabled: true          # Enable thinking mode
  format: "openai"       # Parameter format: "openai" or "anthropic"
  effort: "max"          # Reasoning effort: "low" / "medium" / "high" / "max"

model_mapping:
  flash: "deepseek-v4-flash"
  pro: "deepseek-v4-pro"

log:
  level: "INFO"          # Log level: DEBUG / INFO / WARNING / ERROR
  log_input: false       # Pretty-print request body as JSON
  log_output: false      # Pretty-print response body as JSON

4. Start the Server

python proxy_server.py

Startup output:

==================================================
DeepSeek API Proxy starting on 0.0.0.0:8000
Upstream: https://api.deepseek.com
Thinking mode: ON
  Format: openai
  Effort: max
Log input: OFF
Log output: OFF
==================================================

5. Usage

Point your requests to http://localhost:8000. Always include your own Authorization: Bearer sk-xxx header.

curl Example

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-key" \
  -d '{
    "model": "pro",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": false,
    "max_tokens": 1000
  }'

OpenAI Python SDK

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-key",
    base_url="http://localhost:8000/v1"
)

response = client.chat.completions.create(
    model="pro",
    messages=[{"role": "user", "content": "Write a poem"}],
    max_tokens=1000
)

print(response.choices[0].message.content)

Streaming

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-key",
    base_url="http://localhost:8000/v1"
)

stream = client.chat.completions.create(
    model="pro",
    messages=[{"role": "user", "content": "Tell me a joke"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

What the Proxy Does

The proxy automatically injects the following parameters before forwarding requests (if not already present):

OpenAI format (thinking.format: "openai"):

{
  "thinking": {"type": "enabled"},
  "reasoning_effort": "max"
}

Anthropic format (thinking.format: "anthropic"):

{
  "thinking": {"type": "enabled"},
  "output_config": {"effort": "max"}
}

If these parameters already exist in the request, the proxy will not overwrite them.

Multi-turn Conversations

For thinking mode, DeepSeek API requires every role: assistant message in the conversation history to carry its reasoning_content. The proxy automatically caches the upstream reasoning content and restores it in subsequent requests.

Project Structure

API_Router/
├── config.yaml
├── proxy_server.py
├── requirements.txt
├── README.md
└── README_EN.md

FAQ

Q: Where do I configure my api_key?

No configuration needed. Simply include the Authorization: Bearer sk-xxx header in your request as you normally would — the proxy passes it through.

Q: Where do I set max_tokens?

By default, max_tokens is specified by the user in the request and passed through. If you set output_length.max_tokens in config.yaml, the proxy will override max_tokens before forwarding (and also overwrite existing max_completion_tokens / max_output_tokens).

Q: How do I disable thinking mode?

Set thinking.enabled to false in config.yaml.

Q: What reasoning effort values are available?

low / medium / high / max.

Q: How do I view full request/response content?

Set log.log_input and log.log_output to true in config.yaml. Logs will be pretty-printed as formatted JSON.

Q: What model aliases are supported?

prodeepseek-v4-pro, flashdeepseek-v4-flash. You can use pro or flash directly as the model name.