Fix Mux SRT streaming and reconnect bugs#1090
Merged
Merged
Conversation
Two bugs surfaced in production stats and logs after #1087 deployed: - srtWrite retried the same Buffer after a reconnect, but @eyevinn/srt transfers the chunk's ArrayBuffer to its worker via postMessage on the first call, leaving it detached in our process. The retry threw "Cannot transfer object of unsupported type" and killed the party. Refactor srtWrite to take the persistent chunkBuf as a source and allocate a fresh allocUnsafeSlow + copy each loop iteration. - 15-second pacing lead against a 1000ms SRTO_LATENCY meant libsrt was TLPKTDROP'ing ~50% of attempted packets from the very first stats snapshot (msSndBuf hovering at ~800ms with zero loss/retrans). Mux saw 26 seconds of gappy audio before dropping the connection. Convert PACING_LEAD_SEC=15 to PACING_LEAD_MS=900 so the lead stays comfortably under the SRT latency window. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two bugs that surfaced in production after #1087 went live:
Reconnect path was broken —
srtWriteretried the same Buffer after a reconnect, but@eyevinn/srttransferschunk.bufferto its worker via postMessage on the first call, leaving it detached in our process. The retry threwCannot transfer object of unsupported typeand killed the party. RefactorsrtWrite(source, length)to take the persistentchunkBufand allocate a freshallocUnsafeSlow + copyeach loop iteration.15-second pacing lead vs 1000ms SRT latency — libsrt was TLPKTDROP'ing ~50% of attempted packets from the very first stats snapshot. Sender stats showed
pktSent ≈ pktSndDrop ≈ 40/2s,msSndBufhovering near 800ms, with zero loss and zero retransmissions — classic signature of packets aging out of the send buffer past their TSBPD deadline. Mux saw 26 seconds of gappy audio before closing the connection. ConvertPACING_LEAD_SEC=15→PACING_LEAD_MS=900so the lead stays under the 1000msSRTO_LATENCYwindow.The pacing fix should prevent the disconnect we kept hitting. The buffer fix makes the reconnect safety net actually function if a disconnect does happen for some other reason.
Test plan
mise run lintcleanmise run test— 178 tests passing[mux/srt] statsshowspktSndDrop ≈ 0(was ~50% of pktSent before)Not addressed in this PR
A third issue from the same incident — Node 24's V8 isolate teardown crashing on
AsyncSRT.dispose()(Check failed: (array_buffer_allocator) != nullptr) — is likely downstream of the orphaned transferred buffer from the bug we're fixing here. If it still reproduces after this lands, we'll address separately (probably by skippingdispose()on the broken path and leaking the worker, or by pinning to Node 22).🤖 Generated with Claude Code