Skip to content

Fix Mux SRT streaming and reconnect bugs#1090

Merged
fusion2004 merged 1 commit into
mainfrom
fix-mux-streaming-and-reconnection
May 21, 2026
Merged

Fix Mux SRT streaming and reconnect bugs#1090
fusion2004 merged 1 commit into
mainfrom
fix-mux-streaming-and-reconnection

Conversation

@fusion2004
Copy link
Copy Markdown
Owner

Summary

Two bugs that surfaced in production after #1087 went live:

  • Reconnect path was brokensrtWrite retried the same Buffer after a reconnect, but @eyevinn/srt transfers chunk.buffer to its worker via postMessage on the first call, leaving it detached in our process. The retry threw Cannot transfer object of unsupported type and killed the party. Refactor srtWrite(source, length) to take the persistent chunkBuf and allocate a fresh allocUnsafeSlow + copy each loop iteration.

  • 15-second pacing lead vs 1000ms SRT latency — libsrt was TLPKTDROP'ing ~50% of attempted packets from the very first stats snapshot. Sender stats showed pktSent ≈ pktSndDrop ≈ 40/2s, msSndBuf hovering near 800ms, with zero loss and zero retransmissions — classic signature of packets aging out of the send buffer past their TSBPD deadline. Mux saw 26 seconds of gappy audio before closing the connection. Convert PACING_LEAD_SEC=15PACING_LEAD_MS=900 so the lead stays under the 1000ms SRTO_LATENCY window.

The pacing fix should prevent the disconnect we kept hitting. The buffer fix makes the reconnect safety net actually function if a disconnect does happen for some other reason.

Test plan

  • mise run lint clean
  • mise run test — 178 tests passing
  • Run a real listening party against Mux and confirm [mux/srt] stats shows pktSndDrop ≈ 0 (was ~50% of pktSent before)
  • Confirm party completes intro → song → outro without the SRT connection breaking

Not addressed in this PR

A third issue from the same incident — Node 24's V8 isolate teardown crashing on AsyncSRT.dispose() (Check failed: (array_buffer_allocator) != nullptr) — is likely downstream of the orphaned transferred buffer from the bug we're fixing here. If it still reproduces after this lands, we'll address separately (probably by skipping dispose() on the broken path and leaking the worker, or by pinning to Node 22).

🤖 Generated with Claude Code

Two bugs surfaced in production stats and logs after #1087 deployed:

- srtWrite retried the same Buffer after a reconnect, but @eyevinn/srt
  transfers the chunk's ArrayBuffer to its worker via postMessage on
  the first call, leaving it detached in our process. The retry threw
  "Cannot transfer object of unsupported type" and killed the party.
  Refactor srtWrite to take the persistent chunkBuf as a source and
  allocate a fresh allocUnsafeSlow + copy each loop iteration.

- 15-second pacing lead against a 1000ms SRTO_LATENCY meant libsrt was
  TLPKTDROP'ing ~50% of attempted packets from the very first stats
  snapshot (msSndBuf hovering at ~800ms with zero loss/retrans). Mux
  saw 26 seconds of gappy audio before dropping the connection.
  Convert PACING_LEAD_SEC=15 to PACING_LEAD_MS=900 so the lead stays
  comfortably under the SRT latency window.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fusion2004 fusion2004 merged commit f9ccb7a into main May 21, 2026
6 checks passed
@fusion2004 fusion2004 deleted the fix-mux-streaming-and-reconnection branch May 21, 2026 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant