fix(rtmg/web): bulk sends freeze all reads — chunked sends + write_audio recv hardening#245
Merged
leszko merged 2 commits intoJun 11, 2026
Conversation
…io recv hardening Two server-side halves of the rt-input deadlock (the client halves are rtmg-vst#33), both reproduced and validated live against the :dev pod: 1. websockets-sync holds protocol_mutex across socket.sendall, and recv_events — the thread that reads EVERY inbound frame — needs the same mutex. A single 11 MB stem send therefore froze all reads until the peer drained it; against a VST mid write_audio upload (its own reads gated behind its sends, see rtmg-vst#33) the two sides wedged permanently: params dead, splices never received, keepalive killed the session (1011 via the tunnel). Thread dump of the live wedge: conn_handler in sendall holding protocol_mutex; recv_events and keepalive blocked acquiring it. Big payloads (stems, the post-swap source mirror, slice frames) now go out as fragmented messages in ~256 KiB pieces — the mutex releases between fragments so reads interleave and the cycle cannot form. Fragmentation is invisible at the message layer; payload bytes are identical. 2. write_audio's binary payload was read with a bare blocking recv and no type check: an orphan header consumed the NEXT JSON command as its payload (audio_write_failed: "a bytes-like object is required, not 'str'"), and a payload that never arrived blocked the recv loop forever — wedging the whole session. The read now has a 10 s timeout and a bytes type check; both failure modes answer audio_write_failed and keep the session alive. The chunked-send half also applies to main (stem delivery freezes reads there too, e.g. against a swap upload in flight) and is worth cherry-picking independently of rt-input. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…d reads Review fixes for #245: - The post-swap source mirror in _serialize_swap_ready was still a plain ws.send of a full-length f16 buffer (tens of MB) — the largest single payload on the wire and exactly the read-freezing sendall this PR exists to eliminate. It now goes through chunked_ws_send. - The 10 s timeout + binary type check added for write_audio now covers set_timbre_source, set_structure_source, and the client-upload arm of swap_source via a shared _recv_binary_payload helper — same orphan- header wedge class, same graceful *_failed answer. The not-binary log includes a preview of the consumed frame so a dropped JSON command is traceable. - The control-bus recv thunk accepts (and ignores) the timeout kwarg, so the TypeError fallback around recv_audio(timeout=10) is gone — it fired on every MCP-injected write_audio in production and could mask a genuine TypeError from inside ws.recv. - chunked_ws_send: rename the _chunk param to chunk_size and annotate. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
leszko
approved these changes
Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on the rt-input branch (#235). Server half of the deadlock that made live sequencer splices dead end-to-end — client half is rtmg-vst#33. Both were running as hot-patches on the :dev pod (40431735) during today's debugging with @gioelecerati and validated live (paint → splice →
write_audio_applied→ ack → audible, ~1.5 s + emergence); this PR makes them durable before the next bake wipes them.1. Chunked (fragmented) bulk sends
websockets-sync holds
protocol_mutexacrosssocket.sendall, andrecv_events— the thread that reads every inbound frame — needs that same mutex. One 11 MB stem send froze all reads until the peer drained it; against a VST midwrite_audioupload (whose own reads were gated behind its sends — ixwebsocket bug, fixed in rtmg-vst#33) the two sides deadlocked permanently: params dead, splices never received, keepalive killed the session (1011via the CF tunnel).Thread dump of the live wedge:
conn_handler:send_stem_payload→socket.sendallholdingprotocol_mutexrecv_events: blocked acquiringprotocol_mutexkeepalive: blocked acquiringprotocol_mutexFix: stems, the post-swap source mirror, and slice frames go out as fragmented messages in ~256 KiB pieces (
chunked_ws_sendinaudio_codec.py) — the mutex releases between fragments so reads always interleave. Fragmentation is invisible at the message layer; payload bytes are identical (verified with the web SDK and the VST's ixwebsocket).Note: this half also applies to
main— stem delivery freezes reads there too (e.g. against a swap upload in flight) — and is worth cherry-picking independently of rt-input.2.
write_audiopayload read hardeningThe binary payload was read with a bare blocking
recv()and no type check:write_audioheader consumed the next JSON command as its payload (probe-reproduced:audio_write_failed: a bytes-like object is required, not 'str'),The read now has a 10 s timeout and a bytes type check; both failure modes answer
audio_write_failedand keep the session alive.Remaining latency levers (not in this PR)
Working end-to-end latency is ~1.5 s transport + 2–5 s emergence. The bar ships as f32 (1.49 MB) — f16/s16/zstd would cut upload 2–10×; #240's near-playhead repatch attacks the emergence delay.
🤖 Generated with Claude Code