A local HTTP proxy that lets the Factory Droid CLI's BYOK (Bring Your Own Key) feature work with any OpenAI-compatible chat-completions endpoint — Zen Go, OpenRouter, Together, Groq, Fireworks, DeepSeek direct, etc.
Droid's provider: "openai" BYOK path uses the OpenAI Responses API (POST /responses) — the newer, agentic protocol with named output items, custom grammar tools, and reasoning items. Most third-party providers only implement the older Chat Completions API (POST /v1/chat/completions).
The proxy translates Responses ↔ Chat Completions in both directions, including:
- Streaming SSE event sequences (
response.output_item.added,response.output_text.delta,response.function_call_arguments.delta,response.custom_tool_call_input.delta, etc.) - Tool calling (both standard
functionand OpenAI's newercustomgrammar-constrained tools, like Droid'sApplyPatch) - Reasoning content roundtrip — critical for thinking-mode models (DeepSeek, GLM, Kimi, MiMo) that demand the prior turn's
reasoning_contentbe echoed back - Tool result inputs (
function_call_output,custom_tool_call_output) - Structured outputs (
text.format↔response_format) - Image inputs, system instructions, sampling params
It also includes:
- Per-model upstream routing via a config file — different model names can hit different providers
- Retry with backoff on transient upstream failures (408/425/429/5xx + network errors)
- 2-hour idle self-exit — no orphaned processes
- Auto-start via Droid's
SessionStarthook
- macOS (Linux probably works, untested)
- Python 3.10 or newer (uses
match/case, walrus, modern type hints) - Factory Droid CLI installed (
droidonPATH) - An API key for at least one OpenAI-compatible upstream
git clone https://github.com/thelostorbital/factory-droid-bridge.git
cd factory-droid-bridge
./install.shinstall.sh copies droid_responses_proxy.py + start_droid_proxy.sh to ~/.factory/bin/, makes the launcher executable, and creates ~/.factory/logs/. It verifies Python 3.10+ is present.
After that, follow Configure below to register the SessionStart hook + your custom models in ~/.factory/settings.json.
If you'd rather not run a script:
mkdir -p ~/.factory/bin ~/.factory/logs
cp droid_responses_proxy.py start_droid_proxy.sh ~/.factory/bin/
chmod +x ~/.factory/bin/start_droid_proxy.shNo pip install, no daemon registration, no compiled deps. Stdlib Python only.
In ~/.factory/settings.json, add this top-level hooks block (or merge with whatever you have):
{
"hooks": {
"SessionStart": [
{
"matcher": "",
"hooks": [
{
"type": "command",
"command": "/Users/YOU/.factory/bin/start_droid_proxy.sh"
}
]
}
]
}
}Replace /Users/YOU/ with your actual home path (the hook command field doesn't expand ~ or $HOME).
In the same settings.json, add customModels entries that point at the proxy (http://127.0.0.1:18080). The model field is what Droid will send on the wire — it's also what the proxy uses to look up the route.
{
"customModels": [
{
"model": "kimi-k2.6",
"displayName": "Kimi K2.6",
"baseUrl": "http://127.0.0.1:18080",
"apiKey": "<your upstream api key>",
"provider": "openai"
},
{
"model": "deepseek-v4-flash",
"displayName": "DeepSeek V4 Flash",
"baseUrl": "http://127.0.0.1:18080",
"apiKey": "<your upstream api key>",
"provider": "openai"
}
]
}baseUrl must be exactly http://127.0.0.1:18080 — no trailing slash, no path. Droid appends /v1/responses itself.
provider must be "openai" — this proxy translates the OpenAI Responses API specifically. (If you need Anthropic-native models, set provider: "anthropic" and a real Anthropic-compatible baseUrl; that bypasses the proxy.)
The first time the proxy starts, it writes ~/.factory/bin/proxy_routes.json with a default config. Edit this file to add new upstreams.
{
"default_upstream": "https://opencode.ai/zen/go/v1/chat/completions",
"routes": [
{
"models": ["kimi-k2.6", "kimi-k2.5", "deepseek-v4-pro", "deepseek-v4-flash"],
"upstream": "https://opencode.ai/zen/go/v1/chat/completions"
},
{
"prefix": "openrouter/",
"strip_prefix": true,
"upstream": "https://openrouter.ai/api/v1/chat/completions",
"headers": {
"HTTP-Referer": "https://factory.ai",
"X-Title": "Droid CLI"
}
},
{
"prefix": "together/",
"strip_prefix": true,
"upstream": "https://api.together.xyz/v1/chat/completions"
},
{
"regex": "^groq-",
"upstream": "https://api.groq.com/openai/v1/chat/completions"
}
]
}| Matcher | Field | Behavior |
|---|---|---|
| Exact list | "models": ["m1", "m2"] |
Match if request model is exactly one of these |
| Prefix | "prefix": "openrouter/" |
Match if request model starts with this |
| Regex | "regex": "^claude-" |
re.search on the model name |
| Field | Purpose |
|---|---|
upstream (required) |
Full Chat Completions URL to POST to |
strip_prefix (bool, prefix-only) |
Drop the prefix from the model name before forwarding |
model_rewrite (string) |
Replace the model name entirely |
headers (dict) |
Extra headers merged into the upstream request (overrides defaults — useful for HTTP-Referer, X-Title, etc.) |
Restart the proxy after editing this file: kill $(lsof -ti:18080); ~/.factory/bin/start_droid_proxy.sh
The proxy forwards whatever Authorization: Bearer <key> Droid sends. Droid populates that header from the apiKey field of the matching customModels entry. So:
- One API key per model entry in settings.json.
- Different models can use different keys — perfect when you've routed them to different upstreams.
The SessionStart hook starts the proxy automatically when Droid launches. Manual control:
# Start (idempotent — no-op if already running)
~/.factory/bin/start_droid_proxy.sh
# Health check
curl -s http://127.0.0.1:18080/health
# Inspect current route config (also dumps default_upstream + retry settings)
curl -s http://127.0.0.1:18080/routes | python3 -m json.tool
# Stop
kill $(lsof -ti:18080)
# Restart (e.g., after editing proxy_routes.json)
kill $(lsof -ti:18080); ~/.factory/bin/start_droid_proxy.sh- Start Droid, pick one of your custom models.
- Ask any question.
- You should see a streaming response (not a 400 or 404).
tail -f ~/.factory/logs/droid_responses_proxy.logshould show aPOST /v1/responsesline per request.
If the response is empty or fails, check the most recent body dump:
ls -t ~/.factory/logs/proxy-bodies/ | head -1 | xargs -I{} python3 -m json.tool ~/.factory/logs/proxy-bodies/{}This shows exactly what Droid sent and exactly what the proxy forwarded — the single best debugging tool.
POST /v1/responses(or any path ending in/responses)- Streaming (SSE) and non-streaming
- Function tools (Chat Completions standard
{type: function}) - Custom tools (OpenAI's
{type: custom}with Lark or regex grammar — e.g. Droid's ApplyPatch) - Function and custom tool result inputs (
function_call_output,custom_tool_call_output) - System instructions
- Image inputs (
input_image→image_url) - Structured outputs (
text.format↔response_format) - Reasoning content roundtrip across turns (cached by
call_id) - Per-model upstream routing
- Retry on 408/425/429/500/502/503/504 + network exceptions
- Sampling params:
temperature,top_p,max_output_tokens→max_tokens,parallel_tool_calls,seed,user
- OpenAI built-in tools —
web_search,file_search,computer_use,code_interpreter,mcp. These require server-side tooling; they're silently dropped from the request. - ZDR-mode encrypted reasoning —
encrypted_contentblobs are opaque to the proxy. Plaintextsummaryis fine. - Anthropic-native upstreams — this proxy translates only to Chat Completions. For Claude, configure Droid with
provider: "anthropic"and a real Anthropic-compatible base URL — that path doesn't go through this proxy at all. - Per-token streaming of custom-tool input — emitted as a single delta at end of stream. The rest streams per-token.
These upstream models have been verified end-to-end through the proxy with Droid — full streaming, tool calling, custom-tool (ApplyPatch) round-trips, and multi-turn agentic loops with reasoning preserved across turns:
| Model | Provider | Notes |
|---|---|---|
glm-5.1, glm-5 |
OpenCode Zen Go | Reasoning + ApplyPatch + multi-turn confirmed |
kimi-k2.5, kimi-k2.6 |
OpenCode Zen Go | reasoning_content cache hits across turns |
deepseek-v4-pro, deepseek-v4-flash |
OpenCode Zen Go | Thinking-mode round-trip verified; KV cache reuse working |
mimo-v2.5, mimo-v2.5-pro |
OpenCode Zen Go | Same Chat Completions translation path; reasoning preserved |
Other OpenAI-compatible providers (OpenRouter, Together, Groq, Fireworks, Anyscale, DeepSeek direct, etc.) should work via the per-model routing config but have not been personally smoke-tested. Open an issue with your config + the relevant body dump from ~/.factory/logs/proxy-bodies/ if a specific provider misbehaves.
Models you access via Droid's native Anthropic adapter (e.g., minimax-m2.7, minimax-m2.5 against Zen Go's /v1/messages endpoint) bypass this proxy entirely — those use Droid's provider: "anthropic" code path directly and don't need a translation layer.
| Path | What's in it |
|---|---|
~/.factory/logs/droid_responses_proxy.log |
Per-request summary lines (model, upstream, stream flag, msg count, tool count) |
~/.factory/logs/droid_responses_proxy.stdout.log |
Process stdout/stderr — startup messages, watchdog, fatal errors |
~/.factory/logs/proxy-bodies/*.json |
Last 40 requests, full incoming Responses body + outgoing Chat body side by side |
The body dumps are the most useful debugging artifact when something behaves unexpectedly — they show you ground truth for both directions of the translation.
Set in start_droid_proxy.sh before launching, or as env vars when running manually.
| Variable | Default | Purpose |
|---|---|---|
PORT |
18080 |
Local listen port |
UPSTREAM |
https://opencode.ai/zen/go/v1/chat/completions |
Fallback when no route matches |
PROXY_ROUTES_FILE |
~/.factory/bin/proxy_routes.json |
Route config path |
PROXY_IDLE_TIMEOUT_SECONDS |
7200 (2h) |
Self-exit after this many seconds of inactivity. 0 disables. |
PROXY_RETRY_MAX_ATTEMPTS |
3 |
Max attempts on transient failures (total, not retries) |
PROXY_RETRY_BASE_BACKOFF |
0.5 |
Base seconds for exponential backoff |
PROXY_RETRY_TIMEOUT |
600 |
Per-attempt upstream timeout (seconds) |
PROXY_LOG |
~/.factory/logs/droid_responses_proxy.log |
Per-request log file |
PROXY_DUMP_DIR |
~/.factory/logs/proxy-bodies |
Body capture dir |
PROXY_DUMP_KEEP |
40 |
How many body dumps to retain |
PROXY_REASONING_CACHE_MAX |
4096 |
Max entries in the call_id → reasoning_text LRU cache |
PROXY_DEBUG |
0 |
Set 1 to log full outgoing Chat request bodies (truncated to 600 chars) |
| Event | Behavior |
|---|---|
| Launch Droid | SessionStart hook runs start_droid_proxy.sh; cold-starts proxy in ~600ms, or no-ops in ~50ms if already running |
| Make a request | Increments in-flight counter; refreshes idle timer; both reset on response complete |
| Long stream (model thinking 10 min) | In-flight stays >0; watchdog can't reap |
| Two parallel Droid sessions | Share the same proxy; either's traffic keeps it alive; closing one doesn't affect the other |
| Close all Droid sessions | Proxy keeps running |
| 2 hours with no traffic | Watchdog os._exit(0); next session cold-starts a fresh one |
kill -9 droid mid-request |
Proxy unaffected; recovers normally |
| Reboot | Proxy gone; next Droid launch cold-starts it |
Either the proxy isn't running, or your baseUrl is wrong:
curl http://127.0.0.1:18080/healthshould return JSON with"ok": true.baseUrlin settings.json must be exactlyhttp://127.0.0.1:18080— no trailing slash, no/v1.
Error from provider (DeepSeek): The reasoning_content in the thinking mode must be passed back to the API
Right after a proxy restart, the reasoning cache is empty — DeepSeek wants the prior turn's reasoning echoed back, and the proxy has nothing to attach. One turn is degraded (the proxy fills in a (prior reasoning not retained) placeholder so DeepSeek accepts the request). From the next turn onward, the cache will be populated and reasoning gets reattached automatically.
If this keeps happening every turn, something is restarting the proxy:
tail -f ~/.factory/logs/droid_responses_proxy.stdout.logA type: custom tool (typically ApplyPatch) was dropped or mistranslated. Check the latest dump:
ls -t ~/.factory/logs/proxy-bodies/ | head -1 | xargs -I{} python3 -m json.tool ~/.factory/logs/proxy-bodies/{} | head -100The outgoing_chat.tools list should include an ApplyPatch entry with a single input string parameter. If it's missing, the proxy build is older than the custom-tool support — replace droid_responses_proxy.py with the current version.
Your apiKey in settings.json is invalid or doesn't have access to the model. Verify directly:
curl -s -H "Authorization: Bearer <key>" https://YOUR-UPSTREAM/v1/models | python3 -m json.toolYou're being rate-limited. The proxy retries automatically (3 attempts, exponential backoff), but if all attempts fail, the 429 propagates. Options:
- Slow down agentic loops (reduce
parallel_tool_calls). - Increase
PROXY_RETRY_BASE_BACKOFFandPROXY_RETRY_MAX_ATTEMPTS. - Upgrade your tier with the provider.
The model probably hit max_tokens while still in reasoning. Increase max_output_tokens in Droid's request settings, or pick a model that uses fewer reasoning tokens for short answers.
Check tail -100 ~/.factory/logs/droid_responses_proxy.stdout.log for tracebacks or idle ...s — exiting lines. The 2h idle timeout is intentional; if you want it to live longer, set PROXY_IDLE_TIMEOUT_SECONDS=0 or a larger value in start_droid_proxy.sh.
~/.factory/settings.json
│ customModels[].baseUrl = http://127.0.0.1:18080
▼
Droid CLI
│
│ POST /v1/responses (OpenAI Responses API)
│ stream: true
│ tools: [{type:function, ...}, {type:custom, format:{grammar...}}]
▼
┌─────────────────────────────────┐
│ droid_responses_proxy.py │
│ │
│ 1. Parse Responses body │
│ 2. Resolve route by model name │ ◄── proxy_routes.json
│ 3. Translate items → messages │
│ 4. Translate tools internally │
│ → externally tagged │
│ 5. POST upstream (with retry) │
│ 6. Translate response stream │
│ back to Responses SSE │
└─────────────────────────────────┘
│
│ POST /v1/chat/completions
│ stream: true
│ tools: [{type:function, function:{name, parameters}}]
▼
Upstream provider
(Zen Go / OpenRouter / Together / Groq / etc.)
-
Items → Messages — Responses input is an array of typed items (
message,function_call,function_call_output,custom_tool_call,custom_tool_call_output,reasoning). Chat Completions wantsmessageswithtool_callsglued onto assistant messages androle: toolmessages for results. The proxy coalesces accordingly. -
Tool definitions — Responses tools are internally tagged (
{type:function, name, parameters}or{type:custom, name, format}). Chat tools are externally tagged ({type:function, function:{name, parameters}}). Custom tools become a function with a singleinput: stringparameter and the grammar embedded in the description. -
Reasoning preservation — Thinking-mode models (DeepSeek, GLM, Kimi, MiMo) require the prior turn's
reasoning_contentto be passed back on assistant messages. Droid doesn't roundtrip Responses-formatreasoningitems in subsequent turns. The proxy cachesreasoning_textkeyed bycall_idwhenever it emits a reasoning + tool-call response, and reattaches it asreasoning_contentwhen those call_ids reappear in input. -
Streaming events — Chat streams
delta.contentanddelta.tool_calls[i].function.argumentschunks. Responses streams a named event sequence (response.created,response.in_progress,response.output_item.added,response.output_text.delta,response.function_call_arguments.delta,response.custom_tool_call_input.delta,response.output_item.done,response.completed, etc.). A per-requestStreamBridgestate machine handles the conversion.
Worked example — adding OpenRouter:
- Get an OpenRouter API key from https://openrouter.ai.
- Add a route in
~/.factory/bin/proxy_routes.json:{ "prefix": "openrouter/", "strip_prefix": true, "upstream": "https://openrouter.ai/api/v1/chat/completions", "headers": { "HTTP-Referer": "https://factory.ai", "X-Title": "Droid CLI" } } - Add a model entry in
~/.factory/settings.json:{ "model": "openrouter/google/gemini-2.0-flash-001", "displayName": "Gemini 2.0 Flash (OpenRouter)", "baseUrl": "http://127.0.0.1:18080", "apiKey": "<openrouter key>", "provider": "openai" } - Restart the proxy:
kill $(lsof -ti:18080); ~/.factory/bin/start_droid_proxy.sh
- Restart Droid, pick "Gemini 2.0 Flash (OpenRouter)" from the model picker, chat.
The proxy strips openrouter/ from the model name, so OpenRouter sees google/gemini-2.0-flash-001 (its actual catalog id) with the OpenRouter-required headers attached.
Repo (what you git clone):
factory-droid-bridge/
├── README.md # this file
├── LICENSE # MIT
├── droid_responses_proxy.py # the proxy (stdlib Python)
├── start_droid_proxy.sh # idempotent launcher (bash)
├── install.sh # one-shot installer
├── proxy_routes.example.json # commented multi-provider example
└── .gitignore
After installing, on your machine:
~/.factory/
├── settings.json # hooks.SessionStart + customModels[]
├── bin/
│ ├── droid_responses_proxy.py # copied by install.sh
│ ├── start_droid_proxy.sh # copied by install.sh
│ ├── proxy_routes.example.json # copied by install.sh
│ └── proxy_routes.json # auto-created on first proxy start
└── logs/
├── droid_responses_proxy.log # per-request summary
├── droid_responses_proxy.stdout.log # process stdout/stderr
└── proxy-bodies/ # last 40 request/response dumps
└── 20XXXX-XXXXXX-XXXX.json
- Single-file Python, stdlib only. No
pip install, no virtualenv, no Node, no compiled dependencies. If you have a working Python 3.10+, the proxy runs. - Threading server — one OS thread per request. Adequate for one person's CLI usage; not a production gateway.
- Retry only covers the initial
urlopencall. Once the proxy starts writing SSE bytes back to Droid, it's committed — retrying would duplicate events and corrupt the Stainless SDK's state machine on the client. - The reasoning cache is in-memory (per process). On restart it's empty; the next multi-turn message in a thinking-mode model conversation fires the
(prior reasoning not retained)placeholder once, then steady state resumes. - The idle watchdog uses
time.monotonic()and checks every 60–120s, so it survives system sleep/wake without spurious reaps.
Community-built. Not an official Factory product, not endorsed by Factory. It uses Factory's documented BYOK and hook configuration surface; the translation between OpenAI Responses API and Chat Completions API is implemented against the public specs of both. Use at your own risk; review the code before running it.
MIT — see LICENSE.