Skip to content

thelostorbital/factory-droid-bridge

Repository files navigation

factory-droid-bridge

A local HTTP proxy that lets the Factory Droid CLI's BYOK (Bring Your Own Key) feature work with any OpenAI-compatible chat-completions endpoint — Zen Go, OpenRouter, Together, Groq, Fireworks, DeepSeek direct, etc.


Why this exists

Droid's provider: "openai" BYOK path uses the OpenAI Responses API (POST /responses) — the newer, agentic protocol with named output items, custom grammar tools, and reasoning items. Most third-party providers only implement the older Chat Completions API (POST /v1/chat/completions).

The proxy translates Responses ↔ Chat Completions in both directions, including:

  • Streaming SSE event sequences (response.output_item.added, response.output_text.delta, response.function_call_arguments.delta, response.custom_tool_call_input.delta, etc.)
  • Tool calling (both standard function and OpenAI's newer custom grammar-constrained tools, like Droid's ApplyPatch)
  • Reasoning content roundtrip — critical for thinking-mode models (DeepSeek, GLM, Kimi, MiMo) that demand the prior turn's reasoning_content be echoed back
  • Tool result inputs (function_call_output, custom_tool_call_output)
  • Structured outputs (text.formatresponse_format)
  • Image inputs, system instructions, sampling params

It also includes:

  • Per-model upstream routing via a config file — different model names can hit different providers
  • Retry with backoff on transient upstream failures (408/425/429/5xx + network errors)
  • 2-hour idle self-exit — no orphaned processes
  • Auto-start via Droid's SessionStart hook

Requirements

  • macOS (Linux probably works, untested)
  • Python 3.10 or newer (uses match/case, walrus, modern type hints)
  • Factory Droid CLI installed (droid on PATH)
  • An API key for at least one OpenAI-compatible upstream

Quick install

git clone https://github.com/thelostorbital/factory-droid-bridge.git
cd factory-droid-bridge
./install.sh

install.sh copies droid_responses_proxy.py + start_droid_proxy.sh to ~/.factory/bin/, makes the launcher executable, and creates ~/.factory/logs/. It verifies Python 3.10+ is present.

After that, follow Configure below to register the SessionStart hook + your custom models in ~/.factory/settings.json.

Or install by hand

If you'd rather not run a script:

mkdir -p ~/.factory/bin ~/.factory/logs
cp droid_responses_proxy.py start_droid_proxy.sh ~/.factory/bin/
chmod +x ~/.factory/bin/start_droid_proxy.sh

No pip install, no daemon registration, no compiled deps. Stdlib Python only.


Configure

Step 1 — Register the SessionStart hook

In ~/.factory/settings.json, add this top-level hooks block (or merge with whatever you have):

{
  "hooks": {
    "SessionStart": [
      {
        "matcher": "",
        "hooks": [
          {
            "type": "command",
            "command": "/Users/YOU/.factory/bin/start_droid_proxy.sh"
          }
        ]
      }
    ]
  }
}

Replace /Users/YOU/ with your actual home path (the hook command field doesn't expand ~ or $HOME).

Step 2 — Declare your custom models

In the same settings.json, add customModels entries that point at the proxy (http://127.0.0.1:18080). The model field is what Droid will send on the wire — it's also what the proxy uses to look up the route.

{
  "customModels": [
    {
      "model": "kimi-k2.6",
      "displayName": "Kimi K2.6",
      "baseUrl": "http://127.0.0.1:18080",
      "apiKey": "<your upstream api key>",
      "provider": "openai"
    },
    {
      "model": "deepseek-v4-flash",
      "displayName": "DeepSeek V4 Flash",
      "baseUrl": "http://127.0.0.1:18080",
      "apiKey": "<your upstream api key>",
      "provider": "openai"
    }
  ]
}

baseUrl must be exactly http://127.0.0.1:18080 — no trailing slash, no path. Droid appends /v1/responses itself.

provider must be "openai" — this proxy translates the OpenAI Responses API specifically. (If you need Anthropic-native models, set provider: "anthropic" and a real Anthropic-compatible baseUrl; that bypasses the proxy.)

Step 3 — (Optional) Edit the route table

The first time the proxy starts, it writes ~/.factory/bin/proxy_routes.json with a default config. Edit this file to add new upstreams.

{
  "default_upstream": "https://opencode.ai/zen/go/v1/chat/completions",
  "routes": [
    {
      "models": ["kimi-k2.6", "kimi-k2.5", "deepseek-v4-pro", "deepseek-v4-flash"],
      "upstream": "https://opencode.ai/zen/go/v1/chat/completions"
    },
    {
      "prefix": "openrouter/",
      "strip_prefix": true,
      "upstream": "https://openrouter.ai/api/v1/chat/completions",
      "headers": {
        "HTTP-Referer": "https://factory.ai",
        "X-Title": "Droid CLI"
      }
    },
    {
      "prefix": "together/",
      "strip_prefix": true,
      "upstream": "https://api.together.xyz/v1/chat/completions"
    },
    {
      "regex": "^groq-",
      "upstream": "https://api.groq.com/openai/v1/chat/completions"
    }
  ]
}

Route matchers (first match wins)

Matcher Field Behavior
Exact list "models": ["m1", "m2"] Match if request model is exactly one of these
Prefix "prefix": "openrouter/" Match if request model starts with this
Regex "regex": "^claude-" re.search on the model name

Per-route options

Field Purpose
upstream (required) Full Chat Completions URL to POST to
strip_prefix (bool, prefix-only) Drop the prefix from the model name before forwarding
model_rewrite (string) Replace the model name entirely
headers (dict) Extra headers merged into the upstream request (overrides defaults — useful for HTTP-Referer, X-Title, etc.)

Restart the proxy after editing this file: kill $(lsof -ti:18080); ~/.factory/bin/start_droid_proxy.sh

Step 4 — Get an API key onto the wire

The proxy forwards whatever Authorization: Bearer <key> Droid sends. Droid populates that header from the apiKey field of the matching customModels entry. So:

  • One API key per model entry in settings.json.
  • Different models can use different keys — perfect when you've routed them to different upstreams.

Run

The SessionStart hook starts the proxy automatically when Droid launches. Manual control:

# Start (idempotent — no-op if already running)
~/.factory/bin/start_droid_proxy.sh

# Health check
curl -s http://127.0.0.1:18080/health

# Inspect current route config (also dumps default_upstream + retry settings)
curl -s http://127.0.0.1:18080/routes | python3 -m json.tool

# Stop
kill $(lsof -ti:18080)

# Restart (e.g., after editing proxy_routes.json)
kill $(lsof -ti:18080); ~/.factory/bin/start_droid_proxy.sh

Verify it works

  1. Start Droid, pick one of your custom models.
  2. Ask any question.
  3. You should see a streaming response (not a 400 or 404).
  4. tail -f ~/.factory/logs/droid_responses_proxy.log should show a POST /v1/responses line per request.

If the response is empty or fails, check the most recent body dump:

ls -t ~/.factory/logs/proxy-bodies/ | head -1 | xargs -I{} python3 -m json.tool ~/.factory/logs/proxy-bodies/{}

This shows exactly what Droid sent and exactly what the proxy forwarded — the single best debugging tool.


What's supported

  • POST /v1/responses (or any path ending in /responses)
  • Streaming (SSE) and non-streaming
  • Function tools (Chat Completions standard {type: function})
  • Custom tools (OpenAI's {type: custom} with Lark or regex grammar — e.g. Droid's ApplyPatch)
  • Function and custom tool result inputs (function_call_output, custom_tool_call_output)
  • System instructions
  • Image inputs (input_imageimage_url)
  • Structured outputs (text.formatresponse_format)
  • Reasoning content roundtrip across turns (cached by call_id)
  • Per-model upstream routing
  • Retry on 408/425/429/500/502/503/504 + network exceptions
  • Sampling params: temperature, top_p, max_output_tokensmax_tokens, parallel_tool_calls, seed, user

What's NOT supported

  • OpenAI built-in toolsweb_search, file_search, computer_use, code_interpreter, mcp. These require server-side tooling; they're silently dropped from the request.
  • ZDR-mode encrypted reasoningencrypted_content blobs are opaque to the proxy. Plaintext summary is fine.
  • Anthropic-native upstreams — this proxy translates only to Chat Completions. For Claude, configure Droid with provider: "anthropic" and a real Anthropic-compatible base URL — that path doesn't go through this proxy at all.
  • Per-token streaming of custom-tool input — emitted as a single delta at end of stream. The rest streams per-token.

Tested with

These upstream models have been verified end-to-end through the proxy with Droid — full streaming, tool calling, custom-tool (ApplyPatch) round-trips, and multi-turn agentic loops with reasoning preserved across turns:

Model Provider Notes
glm-5.1, glm-5 OpenCode Zen Go Reasoning + ApplyPatch + multi-turn confirmed
kimi-k2.5, kimi-k2.6 OpenCode Zen Go reasoning_content cache hits across turns
deepseek-v4-pro, deepseek-v4-flash OpenCode Zen Go Thinking-mode round-trip verified; KV cache reuse working
mimo-v2.5, mimo-v2.5-pro OpenCode Zen Go Same Chat Completions translation path; reasoning preserved

Other OpenAI-compatible providers (OpenRouter, Together, Groq, Fireworks, Anyscale, DeepSeek direct, etc.) should work via the per-model routing config but have not been personally smoke-tested. Open an issue with your config + the relevant body dump from ~/.factory/logs/proxy-bodies/ if a specific provider misbehaves.

Models you access via Droid's native Anthropic adapter (e.g., minimax-m2.7, minimax-m2.5 against Zen Go's /v1/messages endpoint) bypass this proxy entirely — those use Droid's provider: "anthropic" code path directly and don't need a translation layer.


Logs and dumps

Path What's in it
~/.factory/logs/droid_responses_proxy.log Per-request summary lines (model, upstream, stream flag, msg count, tool count)
~/.factory/logs/droid_responses_proxy.stdout.log Process stdout/stderr — startup messages, watchdog, fatal errors
~/.factory/logs/proxy-bodies/*.json Last 40 requests, full incoming Responses body + outgoing Chat body side by side

The body dumps are the most useful debugging artifact when something behaves unexpectedly — they show you ground truth for both directions of the translation.


Tunables (environment variables)

Set in start_droid_proxy.sh before launching, or as env vars when running manually.

Variable Default Purpose
PORT 18080 Local listen port
UPSTREAM https://opencode.ai/zen/go/v1/chat/completions Fallback when no route matches
PROXY_ROUTES_FILE ~/.factory/bin/proxy_routes.json Route config path
PROXY_IDLE_TIMEOUT_SECONDS 7200 (2h) Self-exit after this many seconds of inactivity. 0 disables.
PROXY_RETRY_MAX_ATTEMPTS 3 Max attempts on transient failures (total, not retries)
PROXY_RETRY_BASE_BACKOFF 0.5 Base seconds for exponential backoff
PROXY_RETRY_TIMEOUT 600 Per-attempt upstream timeout (seconds)
PROXY_LOG ~/.factory/logs/droid_responses_proxy.log Per-request log file
PROXY_DUMP_DIR ~/.factory/logs/proxy-bodies Body capture dir
PROXY_DUMP_KEEP 40 How many body dumps to retain
PROXY_REASONING_CACHE_MAX 4096 Max entries in the call_id → reasoning_text LRU cache
PROXY_DEBUG 0 Set 1 to log full outgoing Chat request bodies (truncated to 600 chars)

Lifecycle

Event Behavior
Launch Droid SessionStart hook runs start_droid_proxy.sh; cold-starts proxy in ~600ms, or no-ops in ~50ms if already running
Make a request Increments in-flight counter; refreshes idle timer; both reset on response complete
Long stream (model thinking 10 min) In-flight stays >0; watchdog can't reap
Two parallel Droid sessions Share the same proxy; either's traffic keeps it alive; closing one doesn't affect the other
Close all Droid sessions Proxy keeps running
2 hours with no traffic Watchdog os._exit(0); next session cold-starts a fresh one
kill -9 droid mid-request Proxy unaffected; recovers normally
Reboot Proxy gone; next Droid launch cold-starts it

Troubleshooting

BYOK Error: 404 <!DOCTYPE html>...

Either the proxy isn't running, or your baseUrl is wrong:

  • curl http://127.0.0.1:18080/health should return JSON with "ok": true.
  • baseUrl in settings.json must be exactly http://127.0.0.1:18080 — no trailing slash, no /v1.

Error from provider (DeepSeek): The reasoning_content in the thinking mode must be passed back to the API

Right after a proxy restart, the reasoning cache is empty — DeepSeek wants the prior turn's reasoning echoed back, and the proxy has nothing to attach. One turn is degraded (the proxy fills in a (prior reasoning not retained) placeholder so DeepSeek accepts the request). From the next turn onward, the cache will be populated and reasoning gets reattached automatically.

If this keeps happening every turn, something is restarting the proxy:

tail -f ~/.factory/logs/droid_responses_proxy.stdout.log

Model says "I don't have an Edit/Write tool available"

A type: custom tool (typically ApplyPatch) was dropped or mistranslated. Check the latest dump:

ls -t ~/.factory/logs/proxy-bodies/ | head -1 | xargs -I{} python3 -m json.tool ~/.factory/logs/proxy-bodies/{} | head -100

The outgoing_chat.tools list should include an ApplyPatch entry with a single input string parameter. If it's missing, the proxy build is older than the custom-tool support — replace droid_responses_proxy.py with the current version.

Upstream returns 401/403

Your apiKey in settings.json is invalid or doesn't have access to the model. Verify directly:

curl -s -H "Authorization: Bearer <key>" https://YOUR-UPSTREAM/v1/models | python3 -m json.tool

Upstream returns 429

You're being rate-limited. The proxy retries automatically (3 attempts, exponential backoff), but if all attempts fail, the 429 propagates. Options:

  • Slow down agentic loops (reduce parallel_tool_calls).
  • Increase PROXY_RETRY_BASE_BACKOFF and PROXY_RETRY_MAX_ATTEMPTS.
  • Upgrade your tier with the provider.

Empty / cut-off responses

The model probably hit max_tokens while still in reasoning. Increase max_output_tokens in Droid's request settings, or pick a model that uses fewer reasoning tokens for short answers.

The proxy keeps dying

Check tail -100 ~/.factory/logs/droid_responses_proxy.stdout.log for tracebacks or idle ...s — exiting lines. The 2h idle timeout is intentional; if you want it to live longer, set PROXY_IDLE_TIMEOUT_SECONDS=0 or a larger value in start_droid_proxy.sh.


Architecture

                  ~/.factory/settings.json
                          │  customModels[].baseUrl = http://127.0.0.1:18080
                          ▼
                       Droid CLI
                          │
                          │  POST /v1/responses   (OpenAI Responses API)
                          │  stream: true
                          │  tools: [{type:function, ...}, {type:custom, format:{grammar...}}]
                          ▼
            ┌─────────────────────────────────┐
            │   droid_responses_proxy.py      │
            │                                 │
            │  1. Parse Responses body        │
            │  2. Resolve route by model name │ ◄── proxy_routes.json
            │  3. Translate items → messages  │
            │  4. Translate tools internally  │
            │     → externally tagged         │
            │  5. POST upstream (with retry)  │
            │  6. Translate response stream   │
            │     back to Responses SSE       │
            └─────────────────────────────────┘
                          │
                          │  POST /v1/chat/completions
                          │  stream: true
                          │  tools: [{type:function, function:{name, parameters}}]
                          ▼
              Upstream provider
              (Zen Go / OpenRouter / Together / Groq / etc.)

Translation layers

  1. Items → Messages — Responses input is an array of typed items (message, function_call, function_call_output, custom_tool_call, custom_tool_call_output, reasoning). Chat Completions wants messages with tool_calls glued onto assistant messages and role: tool messages for results. The proxy coalesces accordingly.

  2. Tool definitions — Responses tools are internally tagged ({type:function, name, parameters} or {type:custom, name, format}). Chat tools are externally tagged ({type:function, function:{name, parameters}}). Custom tools become a function with a single input: string parameter and the grammar embedded in the description.

  3. Reasoning preservation — Thinking-mode models (DeepSeek, GLM, Kimi, MiMo) require the prior turn's reasoning_content to be passed back on assistant messages. Droid doesn't roundtrip Responses-format reasoning items in subsequent turns. The proxy caches reasoning_text keyed by call_id whenever it emits a reasoning + tool-call response, and reattaches it as reasoning_content when those call_ids reappear in input.

  4. Streaming events — Chat streams delta.content and delta.tool_calls[i].function.arguments chunks. Responses streams a named event sequence (response.created, response.in_progress, response.output_item.added, response.output_text.delta, response.function_call_arguments.delta, response.custom_tool_call_input.delta, response.output_item.done, response.completed, etc.). A per-request StreamBridge state machine handles the conversion.


Adding a new provider

Worked example — adding OpenRouter:

  1. Get an OpenRouter API key from https://openrouter.ai.
  2. Add a route in ~/.factory/bin/proxy_routes.json:
    {
      "prefix": "openrouter/",
      "strip_prefix": true,
      "upstream": "https://openrouter.ai/api/v1/chat/completions",
      "headers": {
        "HTTP-Referer": "https://factory.ai",
        "X-Title": "Droid CLI"
      }
    }
  3. Add a model entry in ~/.factory/settings.json:
    {
      "model": "openrouter/google/gemini-2.0-flash-001",
      "displayName": "Gemini 2.0 Flash (OpenRouter)",
      "baseUrl": "http://127.0.0.1:18080",
      "apiKey": "<openrouter key>",
      "provider": "openai"
    }
  4. Restart the proxy:
    kill $(lsof -ti:18080); ~/.factory/bin/start_droid_proxy.sh
  5. Restart Droid, pick "Gemini 2.0 Flash (OpenRouter)" from the model picker, chat.

The proxy strips openrouter/ from the model name, so OpenRouter sees google/gemini-2.0-flash-001 (its actual catalog id) with the OpenRouter-required headers attached.


File layout

Repo (what you git clone):

factory-droid-bridge/
├── README.md                                   # this file
├── LICENSE                                     # MIT
├── droid_responses_proxy.py                    # the proxy (stdlib Python)
├── start_droid_proxy.sh                        # idempotent launcher (bash)
├── install.sh                                  # one-shot installer
├── proxy_routes.example.json                   # commented multi-provider example
└── .gitignore

After installing, on your machine:

~/.factory/
├── settings.json                               # hooks.SessionStart + customModels[]
├── bin/
│   ├── droid_responses_proxy.py                # copied by install.sh
│   ├── start_droid_proxy.sh                    # copied by install.sh
│   ├── proxy_routes.example.json               # copied by install.sh
│   └── proxy_routes.json                       # auto-created on first proxy start
└── logs/
    ├── droid_responses_proxy.log               # per-request summary
    ├── droid_responses_proxy.stdout.log        # process stdout/stderr
    └── proxy-bodies/                           # last 40 request/response dumps
        └── 20XXXX-XXXXXX-XXXX.json

Implementation notes

  • Single-file Python, stdlib only. No pip install, no virtualenv, no Node, no compiled dependencies. If you have a working Python 3.10+, the proxy runs.
  • Threading server — one OS thread per request. Adequate for one person's CLI usage; not a production gateway.
  • Retry only covers the initial urlopen call. Once the proxy starts writing SSE bytes back to Droid, it's committed — retrying would duplicate events and corrupt the Stainless SDK's state machine on the client.
  • The reasoning cache is in-memory (per process). On restart it's empty; the next multi-turn message in a thinking-mode model conversation fires the (prior reasoning not retained) placeholder once, then steady state resumes.
  • The idle watchdog uses time.monotonic() and checks every 60–120s, so it survives system sleep/wake without spurious reaps.

Disclaimer

Community-built. Not an official Factory product, not endorsed by Factory. It uses Factory's documented BYOK and hook configuration surface; the translation between OpenAI Responses API and Chat Completions API is implemented against the public specs of both. Use at your own risk; review the code before running it.

License

MIT — see LICENSE.

About

Local translation proxy: Factory Droid CLI BYOK ↔ any OpenAI-compatible chat-completions endpoint

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors