Skip to content

fix(0.6.1): Realtime firstMessage interruption on adopted path#95

Open
nicolotognoni wants to merge 1 commit into
feat/observability-otel-attrs-0.6.1from
fix/0.6.1-realtime-firstmessage-adopted-race
Open

fix(0.6.1): Realtime firstMessage interruption on adopted path#95
nicolotognoni wants to merge 1 commit into
feat/observability-otel-attrs-0.6.1from
fix/0.6.1-realtime-firstmessage-adopted-race

Conversation

@nicolotognoni
Copy link
Copy Markdown
Collaborator

Summary

  • With agent.prewarm=true (default) the OpenAI Realtime WebSocket is opened, primed, and adopted at call pickup with source=adopted ms=0. The audio bridge between Twilio/Telnyx and OpenAI is live the instant the callee answers, so the caller's "Hi" / "Hello?" reliably reaches OpenAI in the ~250-450 ms before the firstMessage audio starts streaming back. OpenAI's server-VAD treats that early caller audio as a barge-in and silently cancels the in-flight response.create — the configured first_message is never delivered and the caller hears the agent respond to their hello instead of the scripted opening. The cold connect() path masked this because the WS handshake naturally buffered ~300 ms of caller silence.
  • Fix: send_first_message / sendFirstMessage now arm a one-shot server-VAD lockout by sending session.update with turn_detection: null immediately before response.create, then restore the original turn_detection block on the firstMessage response.done. Subsequent turns barge in normally. Complements the client-side firstAudioSentAt / _first_audio_sent_at guard from PR fix(0.6.1): dashboard live merge + firstMessage barge-in + drain marks (re-base of #89) #92 — that prevents the local audio bridge from clearing the playout buffer; this prevents the server from cancelling the response.

Implementation

  • Why turn_detection: null and not a temporary high silence_duration_ms? turn_detection: null is fully OpenAI-documented and disables server-VAD entirely with zero edge cases. A high-silence fallback relies on a server-side timer that is sensitive to delivery jitter across the response.done window (variable 1-3 s on long greetings). Null is byte-cheap, documented, and deterministic.
  • New adapter state: _first_message_protection_pending / firstMessageProtectionPending and _saved_turn_detection / savedTurnDetection snapshot. Set on send_first_message, consumed on the next response.done inside the existing receive loop / message listener. Strictly one-shot — a later response.done does not re-trigger the restore.
  • Best-effort failure handling: a failed lockout send clears the pending flag so we don't try to restore a turn_detection we never disabled. A failed restore leaves the session VAD-disabled (degraded barge-in but call still completes) — the next configuration-touching session.update would rearm.
  • Parity respected: behaviour and wire shape identical across Python and TypeScript.
  • Files touched: libraries/python/getpatter/providers/openai_realtime.py, libraries/typescript/src/providers/openai-realtime.ts, the matching _unit.py / unit/*.test.ts test files, and CHANGELOG.md.

Breaking change?

No. send_first_message / sendFirstMessage keep the same signature and external contract. The only observable difference is two extra session.update frames on the wire during the firstMessage turn — both within the documented OpenAI Realtime schema and billing-safe (session.update does not invoke the model).

Test plan

  • Python: pytest tests/unit/test_providers_io_unit.py::TestOpenAIRealtimeAdapterIO -x — 21 passed (was 18, +3 new)
  • Python: full pytest tests/ — 1844 passed, 7 skipped
  • TypeScript: npm test -- --run — 1520 passed across 85 files (was 1516, +4 new)
  • TypeScript: npm run lint — clean
  • TypeScript: npm run build — clean
  • Manual smoke: outbound call on Twilio with agent.prewarm=true, OpenAI Realtime provider, first_message="Hello! Can you hear me?". Verify call-log transcript starts with the agent firstMessage, not the caller's "Hi.".

Docs updates

N/A — no public surface change, fix is entirely internal to the adapter.

With ``agent.prewarm=true`` (default) the OpenAI Realtime WebSocket is
parked, primed, and adopted at call pickup with ``source=adopted ms=0``.
The audio bridge is live the instant the callee answers, and the
caller's "Hi" / "Hello?" reliably reaches OpenAI in the ~250-450 ms
before the firstMessage audio starts streaming back. OpenAI's server-VAD
treats that early caller audio as a barge-in and silently cancels the
in-flight ``response.create``, so the configured ``first_message`` is
never delivered. The cold ``connect()`` path masked the bug because the
WS handshake naturally buffered ~300 ms of caller silence.

Fix: ``send_first_message`` / ``sendFirstMessage`` now arm a one-shot
server-VAD lockout. A ``session.update`` with ``turn_detection: null``
(OpenAI-documented: disables server-VAD entirely, no audio-driven
response cancellation) is sent immediately before ``response.create``,
then the receive loop / message listener restores the original
``turn_detection`` block (snapshotted from the configured ``vad_type``
/ ``silence_duration_ms`` / ``threshold`` / ``prefix_padding_ms``) on
the firstMessage ``response.done`` so barge-in works normally for every
subsequent turn. The lockout is strictly one-shot.

``turn_detection: null`` was chosen over a temporary high
``silence_duration_ms`` because it is fully OpenAI-documented and
guarantees zero server-side cancellation (timer-based fallbacks remain
sensitive to clock skew on multi-second response.done windows).

Complements the client-side ``firstAudioSentAt`` guard from PR #92
which prevents the local audio bridge from clearing the playout buffer
on caller speech — this closes the same gap on the *server* side.

Coverage: 3 new Python tests + 4 new TypeScript tests in the
``OpenAIRealtimeAdapter`` IO suites, covering lockout sequence, custom
``silence_duration_ms`` / ``vad_type`` restore, one-shot semantics, and
no-ws no-op.

Files: libraries/python/getpatter/providers/openai_realtime.py,
libraries/typescript/src/providers/openai-realtime.ts,
libraries/python/tests/unit/test_providers_io_unit.py,
libraries/typescript/tests/unit/openai-realtime.test.ts,
CHANGELOG.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant