Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,14 @@

## 0.6.1 (2026-05-12)

### Fixed — OpenAI Realtime firstMessage silently cancelled on prewarm-adopted sessions

With `agent.prewarm=true` (default) the OpenAI Realtime WebSocket is opened, primed (`session.created` → `session.update` → `session.updated`), and parked during the carrier ringing window; `StreamHandler` then adopts it at call pickup with `source=adopted ms=0`. The audio bridge between the Twilio/Telnyx stream and the upstream OpenAI session is therefore live the instant the callee answers. OpenAI's server-VAD treats any caller audio that arrives before the assistant's first audio frame as a barge-in and cancels the in-flight `response.create`. In practice the caller's "Hi" / "Hello?" reliably reaches OpenAI in the ~250-450 ms before the firstMessage audio starts streaming back — so the configured `first_message` was *silently cancelled* and the caller heard the agent respond to their hello instead of delivering the scripted opening. The cold-path `connect()` masked this because its WS handshake naturally buffered ~300 ms of caller silence.

Fix: `send_first_message` (Py) / `sendFirstMessage` (TS) now arm a one-shot server-VAD lockout immediately before issuing `response.create` for the firstMessage turn. A `session.update` with `turn_detection: null` (an OpenAI-documented value that disables server-VAD entirely — no audio-driven response cancellation) is sent first, then the `response.create`. The receive loop / message listener watches for the firstMessage `response.done` and re-issues a `session.update` restoring the original `turn_detection` block (snapshotted from `vad_type` / `silence_duration_ms` / `threshold` / `prefix_padding_ms`) so barge-in works normally for every subsequent turn. The lockout is strictly one-shot: subsequent `response.done` events (e.g. from later turns) do not re-send the restore. Best-effort: failures on either send fall back to the pre-fix behaviour without breaking the call. Complements the client-side `_first_audio_sent_at` / `firstAudioSentAt` guard added in PR #92 — that one prevents the local audio bridge from clearing the playout buffer on caller speech, this one prevents the *server* from cancelling the response.

Files: `libraries/python/getpatter/providers/openai_realtime.py`, `libraries/typescript/src/providers/openai-realtime.ts`. Coverage: `libraries/python/tests/unit/test_providers_io_unit.py` (3 new tests covering lockout sequence, custom `silence_duration_ms` restore, and one-shot semantics) + `libraries/typescript/tests/unit/openai-realtime.test.ts` (4 new tests covering the same behaviours plus the no-ws no-op).

### Changed — `StreamHandler` adopt-capability check now uses duck typing

The TS realtime adopt branch in `stream-handler.ts` previously relied on `this.adapter instanceof OpenAIRealtimeAdapter` to gate the prewarm-handoff path. Switched to a duck-type check (`typeof adapter.adoptWebSocket === 'function'`) so the generic stream-handler module stays provider-agnostic on this hot path and matches the Python handler's `getattr(self._adapter, "adopt_websocket", None)` shape. Files: `libraries/typescript/src/stream-handler.ts`.
Expand Down
96 changes: 96 additions & 0 deletions libraries/python/getpatter/providers/openai_realtime.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,16 @@ def __init__(
import time as _time

self._session_start_monotonic: float = _time.monotonic()
# ``send_first_message`` arms a one-shot server-VAD lockout so the
# firstMessage turn cannot be interrupted by the caller's first audio
# frames (which happens reliably on prewarm-adopted sessions where the
# adopt+response.create races the caller's "hello?" by a few hundred
# ms). The flag is consumed inside ``receive_events`` when the
# firstMessage ``response.done`` arrives, at which point we re-issue
# ``session.update`` to restore the original ``turn_detection`` block
# captured here. See ``send_first_message`` for the full rationale.
self._first_message_protection_pending: bool = False
self._saved_turn_detection: dict | None = None

def record_session_end(self) -> None:
"""Emit ``patter.cost.realtime_minutes`` for the elapsed session duration."""
Expand Down Expand Up @@ -603,6 +613,38 @@ async def _iter_raw():
self._current_response_item_id = None
self._current_response_audio_ms = 0
self._current_response_first_audio_at = None
# If ``send_first_message`` armed the server-VAD lockout
# for the firstMessage turn, this ``response.done``
# signals the firstMessage finished streaming and it is
# safe to restore the original ``turn_detection`` so
# barge-in works for the rest of the call. Best-effort:
# a failed send leaves the session without VAD, which
# degrades barge-in but does not break the call — the
# next ``session.update`` (e.g. on a tool turn) would
# also rearm. See ``send_first_message`` for the
# full rationale.
if (
self._first_message_protection_pending
and self._saved_turn_detection is not None
):
try:
await self._ws.send(
json.dumps(
{
"type": "session.update",
"session": {
"turn_detection": self._saved_turn_detection,
},
}
)
)
except Exception as exc: # noqa: BLE001
logger.debug(
"first_message: turn_detection restore failed: %s",
exc,
)
self._first_message_protection_pending = False
self._saved_turn_detection = None
yield ("response_done", data.get("response", {}))

elif event_type == "error":
Expand Down Expand Up @@ -704,9 +746,63 @@ async def send_first_message(self, text: str) -> None:
producing role-confused openings (e.g. a receptionist agent
responding "I'd like to schedule a haircut" because it took its own
first_message as a customer cue).

Server-VAD lockout during firstMessage
-------------------------------------

OpenAI Realtime server-VAD treats any caller audio that arrives
before the assistant's first audio frame as a barge-in and cancels
the in-flight ``response.create``. On the prewarm-adopted path
(``source=adopted ms=0``) the WS→audio bridge opens immediately at
call pickup; the caller's "Hi" / "Hello?" reliably reaches OpenAI in
the ~250-450 ms before the firstMessage audio starts streaming back,
so the configured ``first_message`` is *silently cancelled* and the
caller hears the agent respond to their hello instead of delivering
the scripted opening.

Fix: send a ``session.update`` that sets ``turn_detection`` to
``None`` (OpenAI-documented: disables server-VAD entirely, no
audio-driven response cancellation), then ``response.create`` the
firstMessage. ``receive_events`` re-arms ``turn_detection`` from the
saved snapshot the moment ``response.done`` arrives for the
firstMessage turn, restoring normal barge-in for every subsequent
turn. The complementary client-side guard (``_first_audio_sent_at``
in ``stream_handler``) already prevents the caller's outbound clear
from firing — this lockout closes the gap on the *server* side.

Best-effort: if ``session.update`` raises we still proceed with
``response.create``. The fallback behaviour matches the pre-fix
state — a higher likelihood of first-message preemption — but never
worse, so the call still completes.
"""
if self._ws is None:
return
# Snapshot the original turn_detection block so ``receive_events``
# can restore it after the firstMessage ``response.done``. We build
# it from the same configured fields ``_build_session_config`` uses
# so the restore is byte-identical to the cold connect path.
self._saved_turn_detection = {
"type": self.vad_type,
"threshold": 0.5,
"prefix_padding_ms": 300,
"silence_duration_ms": self.silence_duration_ms,
}
self._first_message_protection_pending = True
try:
await self._ws.send(
json.dumps(
{
"type": "session.update",
"session": {"turn_detection": None},
}
)
)
except Exception as exc: # noqa: BLE001 - best-effort lockout
logger.debug("send_first_message: turn_detection lockout failed: %s", exc)
# Clear protection state so receive_events doesn't restore a
# turn_detection we never actually disabled.
self._first_message_protection_pending = False
self._saved_turn_detection = None
await self._ws.send(
json.dumps(
{
Expand Down
Loading
Loading