fix(0.6.1): Realtime firstMessage interruption on adopted path#95
Open
nicolotognoni wants to merge 1 commit into
Open
Conversation
With ``agent.prewarm=true`` (default) the OpenAI Realtime WebSocket is parked, primed, and adopted at call pickup with ``source=adopted ms=0``. The audio bridge is live the instant the callee answers, and the caller's "Hi" / "Hello?" reliably reaches OpenAI in the ~250-450 ms before the firstMessage audio starts streaming back. OpenAI's server-VAD treats that early caller audio as a barge-in and silently cancels the in-flight ``response.create``, so the configured ``first_message`` is never delivered. The cold ``connect()`` path masked the bug because the WS handshake naturally buffered ~300 ms of caller silence. Fix: ``send_first_message`` / ``sendFirstMessage`` now arm a one-shot server-VAD lockout. A ``session.update`` with ``turn_detection: null`` (OpenAI-documented: disables server-VAD entirely, no audio-driven response cancellation) is sent immediately before ``response.create``, then the receive loop / message listener restores the original ``turn_detection`` block (snapshotted from the configured ``vad_type`` / ``silence_duration_ms`` / ``threshold`` / ``prefix_padding_ms``) on the firstMessage ``response.done`` so barge-in works normally for every subsequent turn. The lockout is strictly one-shot. ``turn_detection: null`` was chosen over a temporary high ``silence_duration_ms`` because it is fully OpenAI-documented and guarantees zero server-side cancellation (timer-based fallbacks remain sensitive to clock skew on multi-second response.done windows). Complements the client-side ``firstAudioSentAt`` guard from PR #92 which prevents the local audio bridge from clearing the playout buffer on caller speech — this closes the same gap on the *server* side. Coverage: 3 new Python tests + 4 new TypeScript tests in the ``OpenAIRealtimeAdapter`` IO suites, covering lockout sequence, custom ``silence_duration_ms`` / ``vad_type`` restore, one-shot semantics, and no-ws no-op. Files: libraries/python/getpatter/providers/openai_realtime.py, libraries/typescript/src/providers/openai-realtime.ts, libraries/python/tests/unit/test_providers_io_unit.py, libraries/typescript/tests/unit/openai-realtime.test.ts, CHANGELOG.md.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
agent.prewarm=true(default) the OpenAI Realtime WebSocket is opened, primed, and adopted at call pickup withsource=adopted ms=0. The audio bridge between Twilio/Telnyx and OpenAI is live the instant the callee answers, so the caller's "Hi" / "Hello?" reliably reaches OpenAI in the ~250-450 ms before the firstMessage audio starts streaming back. OpenAI's server-VAD treats that early caller audio as a barge-in and silently cancels the in-flightresponse.create— the configuredfirst_messageis never delivered and the caller hears the agent respond to their hello instead of the scripted opening. The coldconnect()path masked this because the WS handshake naturally buffered ~300 ms of caller silence.send_first_message/sendFirstMessagenow arm a one-shot server-VAD lockout by sendingsession.updatewithturn_detection: nullimmediately beforeresponse.create, then restore the originalturn_detectionblock on the firstMessageresponse.done. Subsequent turns barge in normally. Complements the client-sidefirstAudioSentAt/_first_audio_sent_atguard from PR fix(0.6.1): dashboard live merge + firstMessage barge-in + drain marks (re-base of #89) #92 — that prevents the local audio bridge from clearing the playout buffer; this prevents the server from cancelling the response.Implementation
turn_detection: nulland not a temporary highsilence_duration_ms?turn_detection: nullis fully OpenAI-documented and disables server-VAD entirely with zero edge cases. A high-silence fallback relies on a server-side timer that is sensitive to delivery jitter across theresponse.donewindow (variable 1-3 s on long greetings). Null is byte-cheap, documented, and deterministic._first_message_protection_pending/firstMessageProtectionPendingand_saved_turn_detection/savedTurnDetectionsnapshot. Set onsend_first_message, consumed on the nextresponse.doneinside the existing receive loop / message listener. Strictly one-shot — a laterresponse.donedoes not re-trigger the restore.turn_detectionwe never disabled. A failed restore leaves the session VAD-disabled (degraded barge-in but call still completes) — the next configuration-touchingsession.updatewould rearm.libraries/python/getpatter/providers/openai_realtime.py,libraries/typescript/src/providers/openai-realtime.ts, the matching_unit.py/unit/*.test.tstest files, andCHANGELOG.md.Breaking change?
No.
send_first_message/sendFirstMessagekeep the same signature and external contract. The only observable difference is two extrasession.updateframes on the wire during the firstMessage turn — both within the documented OpenAI Realtime schema and billing-safe (session.updatedoes not invoke the model).Test plan
pytest tests/unit/test_providers_io_unit.py::TestOpenAIRealtimeAdapterIO -x— 21 passed (was 18, +3 new)pytest tests/— 1844 passed, 7 skippednpm test -- --run— 1520 passed across 85 files (was 1516, +4 new)npm run lint— cleannpm run build— cleanagent.prewarm=true, OpenAI Realtime provider,first_message="Hello! Can you hear me?". Verify call-log transcript starts with the agent firstMessage, not the caller's "Hi.".Docs updates
N/A — no public surface change, fix is entirely internal to the adapter.