Stream upstream responses through the proxy (fixes #9 — long-request idle-timeout aborts) by MattMencel · Pull Request #10 · mr-beaver/tokencost

MattMencel · 2026-06-23T16:46:18Z

Fixes #9. Thanks for confirming the diagnosis, @mr-beaver.

What this does

Makes proxy_anthropic and proxy_openai_compat stream the upstream response to the client incrementally instead of buffering the whole body, so long requests no longer trip Claude Code's idle/per-read timeout. The full body is still captured for usage accounting, optimizer savings, cache-state tracking, and dedup.

Approach

A shared, provider-agnostic helper stream_upstream(...) opens the upstream with client.send(stream=True) and returns a StreamingResponse that tees each chunk to the client while accumulating the full body. After the stream ends — cleanly, on client disconnect, on upstream mid-stream error, or on connect-time failure — it calls a handler-specific finalize(...) closure exactly once for accounting + dedup.
Request side is untouched (docs/adr/0001 boundary preserved); parsers _parse_anthropic/_parse_openai already work on the full buffer, so accounting is identical to the old buffered parse.
Per-read timeout=120 kept; no total-duration cap (a code comment guards against re-tidying it).
Dedup cache now round-trips the response content-type (cached SSE replays as text/event-stream, not hardcoded JSON).
Partial/aborted streams record stop_reason="incomplete" (no schema change) instead of masquerading as clean 200s; connect failures record 502 + a [stream] log line with the cause.

Verification

345 tests pass (13 new: incremental delivery via body_iterator, SSE usage accounting, dedup content-type round-trip, client-disconnect partials, connect failure, JSON-through-stream, OpenAI-compat streaming).
Live for ~24h on a fork deployment: the 300–600s request class that previously aborted now completes cleanly — e.g. a 607s request finishing 200 / tool_use / ~37k output tokens. Zero streaming-path failures in the logs over ~3,000 requests/day.

One open question for you

The implementation uses uniform streaming — one code path for SSE and JSON, no content-type branching. Consequence: non-streaming responses (count_tokens, model lists, stream:false, errors) also move from a buffered Content-Length response to chunked. The alternative is to branch on the response head and keep non-streaming responses buffered (two code paths). I went uniform for simplicity and to avoid dual-path drift; both are reversible. Happy to switch to the branch variant if you'd prefer — your call.

🤖 Generated with Claude Code

Splits streaming mechanics (shared stream_upstream helper) from handler-specific accounting. Covers proxy_anthropic + proxy_openai_compat; parse-after-buffer, best-effort partial record on disconnect, dedup cache gated on clean completion. Pre-existing bug in mr-beaver/main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Extend dedup_check to return a 3-tuple (cached_response, content_type, req_hash) and dedup_cache_response to accept an optional content_type parameter (defaulting to "application/json"). This allows streaming responses stored as text/event-stream to be replayed with the correct content-type header, preventing malformed responses to streaming clients. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…lure handling) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…alize Replace the buffered httpx.AsyncClient block in proxy_openai_compat with a finalize closure + stream_upstream call, matching the Anthropic handler pattern. Disconnect mid-stream now records stop_reason="incomplete" instead of losing data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Connect-time failure handling, true incremental-delivery test, dedup content-type round-trip, uniform-vs-branch deferred to maintainer, partial-stream sentinel, and a falsifiable empirical-verification step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Connect failures, client disconnects, upstream mid-stream errors, and finalize/accounting errors now emit a [stream] line to proxy.log with the cause — previously connect causes were returned only to the client and finalize exceptions were swallowed silently. Turns 'something is wrong' into 'here is what and why'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Resolves merge conflict between PR #10 (streaming) and PR #11 (async writes). Both changes included; VERSION bumped to 1.1.7. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

PR #10 streaming tests read tracker.db immediately after the request. PR #11 made writes async (queue + background thread), so rows aren't visible until _process_pending_writes() is called. Add flush before each sqlite3.connect(tmp_db) in affected tests. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

MattMencel and others added 8 commits June 22, 2026 09:01

feat(proxy): add stream_upstream helper (tee + finalize + connect-fai…

5c630bf

…lure handling) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(proxy): stream Anthropic responses via stream_upstream + finalize

b864f7e

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chore: bump version to 1.1.6 — proxy response streaming

54ecaa4

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

MattMencel mentioned this pull request Jun 23, 2026

fix: stop tracker.db write contention from blocking/dropping usage rows #11

Merged

4 tasks

mr-beaver merged commit 2d08bd8 into mr-beaver:main Jun 24, 2026
1 check passed

MattMencel deleted the fix/proxy-streaming branch June 24, 2026 12:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stream upstream responses through the proxy (fixes #9 — long-request idle-timeout aborts)#10

Stream upstream responses through the proxy (fixes #9 — long-request idle-timeout aborts)#10
mr-beaver merged 8 commits into
mr-beaver:mainfrom
MattMencel:fix/proxy-streaming

MattMencel commented Jun 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MattMencel commented Jun 23, 2026

What this does

Approach

Verification

One open question for you

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants