Stream upstream responses through the proxy (fixes #9 — long-request idle-timeout aborts)#10
Merged
Merged
Conversation
Splits streaming mechanics (shared stream_upstream helper) from handler-specific accounting. Covers proxy_anthropic + proxy_openai_compat; parse-after-buffer, best-effort partial record on disconnect, dedup cache gated on clean completion. Pre-existing bug in mr-beaver/main. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extend dedup_check to return a 3-tuple (cached_response, content_type, req_hash) and dedup_cache_response to accept an optional content_type parameter (defaulting to "application/json"). This allows streaming responses stored as text/event-stream to be replayed with the correct content-type header, preventing malformed responses to streaming clients. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…lure handling) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…alize Replace the buffered httpx.AsyncClient block in proxy_openai_compat with a finalize closure + stream_upstream call, matching the Anthropic handler pattern. Disconnect mid-stream now records stop_reason="incomplete" instead of losing data. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Connect-time failure handling, true incremental-delivery test, dedup content-type round-trip, uniform-vs-branch deferred to maintainer, partial-stream sentinel, and a falsifiable empirical-verification step. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Connect failures, client disconnects, upstream mid-stream errors, and finalize/accounting errors now emit a [stream] line to proxy.log with the cause — previously connect causes were returned only to the client and finalize exceptions were swallowed silently. Turns 'something is wrong' into 'here is what and why'. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4 tasks
mr-beaver
pushed a commit
that referenced
this pull request
Jun 24, 2026
PR #10 streaming tests read tracker.db immediately after the request. PR #11 made writes async (queue + background thread), so rows aren't visible until _process_pending_writes() is called. Add flush before each sqlite3.connect(tmp_db) in affected tests. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #9. Thanks for confirming the diagnosis, @mr-beaver.
What this does
Makes
proxy_anthropicandproxy_openai_compatstream the upstream response to the client incrementally instead of buffering the whole body, so long requests no longer trip Claude Code's idle/per-read timeout. The full body is still captured for usage accounting, optimizer savings, cache-state tracking, and dedup.Approach
stream_upstream(...)opens the upstream withclient.send(stream=True)and returns aStreamingResponsethat tees each chunk to the client while accumulating the full body. After the stream ends — cleanly, on client disconnect, on upstream mid-stream error, or on connect-time failure — it calls a handler-specificfinalize(...)closure exactly once for accounting + dedup.docs/adr/0001boundary preserved); parsers_parse_anthropic/_parse_openaialready work on the full buffer, so accounting is identical to the old buffered parse.timeout=120kept; no total-duration cap (a code comment guards against re-tidying it).text/event-stream, not hardcoded JSON).stop_reason="incomplete"(no schema change) instead of masquerading as clean 200s; connect failures record502+ a[stream]log line with the cause.Verification
body_iterator, SSE usage accounting, dedup content-type round-trip, client-disconnect partials, connect failure, JSON-through-stream, OpenAI-compat streaming).200 / tool_use / ~37k output tokens. Zero streaming-path failures in the logs over ~3,000 requests/day.One open question for you
The implementation uses uniform streaming — one code path for SSE and JSON, no content-type branching. Consequence: non-streaming responses (
count_tokens, model lists,stream:false, errors) also move from a bufferedContent-Lengthresponse to chunked. The alternative is to branch on the response head and keep non-streaming responses buffered (two code paths). I went uniform for simplicity and to avoid dual-path drift; both are reversible. Happy to switch to the branch variant if you'd prefer — your call.🤖 Generated with Claude Code