Skip to content

Stream upstream responses through the proxy (fixes #9 — long-request idle-timeout aborts)#10

Merged
mr-beaver merged 8 commits into
mr-beaver:mainfrom
MattMencel:fix/proxy-streaming
Jun 24, 2026
Merged

Stream upstream responses through the proxy (fixes #9 — long-request idle-timeout aborts)#10
mr-beaver merged 8 commits into
mr-beaver:mainfrom
MattMencel:fix/proxy-streaming

Conversation

@MattMencel

Copy link
Copy Markdown
Contributor

Fixes #9. Thanks for confirming the diagnosis, @mr-beaver.

What this does

Makes proxy_anthropic and proxy_openai_compat stream the upstream response to the client incrementally instead of buffering the whole body, so long requests no longer trip Claude Code's idle/per-read timeout. The full body is still captured for usage accounting, optimizer savings, cache-state tracking, and dedup.

Approach

  • A shared, provider-agnostic helper stream_upstream(...) opens the upstream with client.send(stream=True) and returns a StreamingResponse that tees each chunk to the client while accumulating the full body. After the stream ends — cleanly, on client disconnect, on upstream mid-stream error, or on connect-time failure — it calls a handler-specific finalize(...) closure exactly once for accounting + dedup.
  • Request side is untouched (docs/adr/0001 boundary preserved); parsers _parse_anthropic/_parse_openai already work on the full buffer, so accounting is identical to the old buffered parse.
  • Per-read timeout=120 kept; no total-duration cap (a code comment guards against re-tidying it).
  • Dedup cache now round-trips the response content-type (cached SSE replays as text/event-stream, not hardcoded JSON).
  • Partial/aborted streams record stop_reason="incomplete" (no schema change) instead of masquerading as clean 200s; connect failures record 502 + a [stream] log line with the cause.

Verification

  • 345 tests pass (13 new: incremental delivery via body_iterator, SSE usage accounting, dedup content-type round-trip, client-disconnect partials, connect failure, JSON-through-stream, OpenAI-compat streaming).
  • Live for ~24h on a fork deployment: the 300–600s request class that previously aborted now completes cleanly — e.g. a 607s request finishing 200 / tool_use / ~37k output tokens. Zero streaming-path failures in the logs over ~3,000 requests/day.

One open question for you

The implementation uses uniform streaming — one code path for SSE and JSON, no content-type branching. Consequence: non-streaming responses (count_tokens, model lists, stream:false, errors) also move from a buffered Content-Length response to chunked. The alternative is to branch on the response head and keep non-streaming responses buffered (two code paths). I went uniform for simplicity and to avoid dual-path drift; both are reversible. Happy to switch to the branch variant if you'd prefer — your call.

🤖 Generated with Claude Code

MattMencel and others added 8 commits June 22, 2026 09:01
Splits streaming mechanics (shared stream_upstream helper) from
handler-specific accounting. Covers proxy_anthropic + proxy_openai_compat;
parse-after-buffer, best-effort partial record on disconnect, dedup cache
gated on clean completion. Pre-existing bug in mr-beaver/main.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend dedup_check to return a 3-tuple (cached_response, content_type,
req_hash) and dedup_cache_response to accept an optional content_type
parameter (defaulting to "application/json"). This allows streaming
responses stored as text/event-stream to be replayed with the correct
content-type header, preventing malformed responses to streaming clients.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lure handling)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…alize

Replace the buffered httpx.AsyncClient block in proxy_openai_compat with a
finalize closure + stream_upstream call, matching the Anthropic handler pattern.
Disconnect mid-stream now records stop_reason="incomplete" instead of losing data.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Connect-time failure handling, true incremental-delivery test, dedup
content-type round-trip, uniform-vs-branch deferred to maintainer,
partial-stream sentinel, and a falsifiable empirical-verification step.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Connect failures, client disconnects, upstream mid-stream errors, and
finalize/accounting errors now emit a [stream] line to proxy.log with the
cause — previously connect causes were returned only to the client and
finalize exceptions were swallowed silently. Turns 'something is wrong'
into 'here is what and why'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mr-beaver mr-beaver merged commit 2d08bd8 into mr-beaver:main Jun 24, 2026
1 check passed
mr-beaver pushed a commit that referenced this pull request Jun 24, 2026
Resolves merge conflict between PR #10 (streaming) and PR #11 (async writes).
Both changes included; VERSION bumped to 1.1.7.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
mr-beaver pushed a commit that referenced this pull request Jun 24, 2026
PR #10 streaming tests read tracker.db immediately after the request.
PR #11 made writes async (queue + background thread), so rows aren't
visible until _process_pending_writes() is called. Add flush before
each sqlite3.connect(tmp_db) in affected tests.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@MattMencel MattMencel deleted the fix/proxy-streaming branch June 24, 2026 12:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proxy buffers entire upstream response → streaming clients hit idle timeout and abort on long requests

2 participants