Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,23 @@
## Unreleased

### Changed

- **`StreamHandler` adopt-capability check now uses duck typing.** The TS realtime adopt branch in `stream-handler.ts:2229` previously relied on `this.adapter instanceof OpenAIRealtimeAdapter` to gate the prewarm-handoff path. Switched to a duck-type check (`typeof adapter.adoptWebSocket === 'function'`) so the generic stream-handler module stays provider-agnostic on this hot path and matches the Python handler's `getattr(self._adapter, "adopt_websocket", None)` shape. Files: `libraries/typescript/src/stream-handler.ts:2229`.

### Fixed

- **Adapter state leak after a failed parked-session adoption.** When `adopt_websocket` / `adoptWebSocket` raised mid-adoption, the partially-adopted `OpenAIRealtimeAdapter` was in an inconsistent state — `_running` / `messageListenerAttached` was already `true`, the heartbeat task may have started, `_current_response_item_id` / `currentResponseItemId` may have carried leaked state from the parked session, and the `_ws` / `ws` reference pointed at a now-closed socket. Falling through to `connect()` on that carcass raced `session.created` against stale state and corrupted the live call. Fix: handler now re-instantiates the adapter before the cold connect path, guaranteeing a clean slate. Files: `libraries/python/getpatter/stream_handler.py:998`, `libraries/typescript/src/stream-handler.ts:2229`.

- **Eliminated double WebSocket handshake on outbound OpenAI Realtime calls.** `_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) and `_park_provider_connections` / `parkProviderConnections` each built a transient `OpenAIRealtimeAdapter` and opened its own WS against `api.openai.com` during the ringing window — two handshakes per call where one suffices. The warmup-only handshake is a strict subset of what park performs (open WS → `session.created` → `session.update` → `session.updated`) and park keeps the socket open for adoption, so warmup's WS was opened, primed, and immediately discarded. Wasted 150-400 ms of ringing-window budget and doubled the rate-limit pressure against OpenAI. Fix: `_spawn_provider_warmup` no longer builds the Realtime adapter; park is the sole Realtime warm path. Pipeline-mode STT/TTS/LLM warmup is unchanged. Files: `libraries/python/getpatter/client.py:732`, `libraries/typescript/src/client.ts:982`.

- **Built-in tools (`transfer_call` / `end_call`) now land in the primed Realtime session.** `_build_realtime_warmup_adapter` (Py) / `buildRealtimeWarmupAdapter` (TS) constructed the transient `OpenAIRealtimeAdapter` without a `tools=` argument, so the `session.update` sent during ringing carried an empty tool list. When `StreamHandler.start()` adopted that parked WebSocket it skipped a fresh `session.update`, leaving the upstream session permanently unaware that the two Patter built-ins existed — `transfer_call` and `end_call` silently no-op'd on every hit-prewarm call (~80% of outbound calls when prewarm is enabled). Added a shared `build_realtime_tools(...)` helper in `stream_handler.py` and `buildRealtimeTools(...)` in `server.ts` so both the live and warmup paths build the canonical tool list byte-for-byte. Files: `libraries/python/getpatter/stream_handler.py:91`, `libraries/python/getpatter/client.py:790`, `libraries/typescript/src/server.ts:62`, `libraries/typescript/src/client.ts:1030`.

- **OpenAI Realtime warmup now runs during the ringing window.** The `warmup()` method on `OpenAIRealtimeAdapter` (defined in both SDKs) was unreachable from `Patter.call()` — the provider warmup framework only iterated `agent.stt` / `agent.tts` / `agent.llm`, but OpenAI Realtime is an all-in-one provider that's server-instantiated at `StreamHandler.start()` time. `_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) now builds a transient `OpenAIRealtimeAdapter` from the resolved Agent + configured `openai_key` when `agent.provider == "openai_realtime"` and calls `warmup()` in parallel with the carrier `initiate_call`. Saves 150–400 ms of TLS + WebSocket handshake + `session.created` round-trip on the first turn. Files: `libraries/python/getpatter/client.py:732`, `libraries/typescript/src/client.ts:940`.

### Added

- **Parked OpenAI Realtime session adoption — sustained first-turn latency win across consecutive calls.** `Patter._park_provider_connections` (Py) / `Patter.parkProviderConnections` (TS) now also park a fully primed (`session.created` → `session.update` → `session.updated`) OpenAI Realtime WebSocket during the carrier ringing window when `agent.provider == "openai_realtime"`. `OpenAIRealtimeStreamHandler` (Py) and the realtime branch of `StreamHandler.initRealtimeAdapter` (TS) consult the parked slot on `start()` and call `adopt_websocket(...)` / `adoptWebSocket(...)` on the configured adapter instead of paying the cold `connect()` round-trip again — saving ~250–450 ms on the first-turn audio. Best-effort: a dead parked WS, missing OpenAI key, or `open_parked_connection` failure all fall through transparently to the cold connect path. Files: `libraries/python/getpatter/client.py:866`, `libraries/python/getpatter/stream_handler.py:724,950`, `libraries/python/getpatter/telephony/twilio.py:498`, `libraries/python/getpatter/telephony/telnyx.py:605`, `libraries/typescript/src/client.ts:863`, `libraries/typescript/src/stream-handler.ts:2229`.

## 0.6.1 (2026-05-09)

### Fixed — Barge-in bug bundle: 6.8s latency outliers, double-talk dispatch, stale anchors, firstMessage uninterruptible (Python + TypeScript parity)
Expand Down
118 changes: 115 additions & 3 deletions libraries/python/getpatter/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -733,13 +733,26 @@ def _spawn_provider_warmup(self, agent: Agent) -> None:
"""Spawn a fire-and-forget task that warms up STT / TTS / LLM in
parallel with the carrier-side ``initiate_call``.

Pipeline-mode providers (``agent.stt`` / ``agent.tts`` / ``agent.llm``)
are picked up via the optional ``warmup()`` method on each instance.

For ``openai_realtime`` mode the warmup-only handshake is a
strict subset of what :meth:`_park_provider_connections` already
performs (open WS → ``session.created`` → ``session.update`` →
``session.updated``) — and park keeps the socket open for adoption.
Running both creates a double WebSocket handshake against
``api.openai.com`` per call, wastes 150-400 ms of ringing-window
budget, and doubles the rate-limit pressure for no benefit. So
when ``agent.provider == "openai_realtime"`` we let park do all
the Realtime-side work and skip the warmup-only adapter here.

Best-effort: each provider's ``warmup()`` is wrapped in
``asyncio.gather(..., return_exceptions=True)`` so a slow or
failing endpoint cannot block the others. The default
``warmup()`` on the abstract base classes is a no-op, so providers
that don't override it contribute nothing to call latency.
"""
targets = []
targets: list[Any] = []
for provider in (
getattr(agent, "stt", None),
getattr(agent, "tts", None),
Expand All @@ -752,6 +765,14 @@ def _spawn_provider_warmup(self, agent: Agent) -> None:
continue
targets.append(provider)

# ``_build_realtime_warmup_adapter`` only fires for
# ``openai_realtime`` agents, and for those we defer 100% of the
# Realtime-side warm work to :meth:`_park_provider_connections`
# (which runs under the same ``agent.prewarm`` gate on every
# outbound call). The warmup-only handshake is a strict subset of
# what park performs, so running both makes two WS handshakes
# against ``api.openai.com`` per call instead of one.

if not targets:
return

Expand All @@ -774,6 +795,64 @@ async def _run_all() -> None:
self._prewarm_tasks.add(task)
task.add_done_callback(self._prewarm_tasks.discard)

def _build_realtime_warmup_adapter(self, agent: Agent) -> Any | None:
"""Build a transient :class:`OpenAIRealtimeAdapter` configured
identically to the one ``StreamHandler.start()`` will instantiate,
suitable for a single :py:meth:`warmup` call.

Returns ``None`` when warmup is not applicable: the agent is not
in ``openai_realtime`` mode, the OpenAI key is missing, or the
adapter import fails.
"""
if getattr(agent, "provider", None) != "openai_realtime":
return None
api_key = getattr(self._local_config, "openai_key", None)
if not api_key:
return None
try:
from getpatter.providers.openai_realtime import (
OpenAIRealtimeAdapter, # type: ignore[import]
)
except Exception as exc: # noqa: BLE001 - best-effort
logger.debug("Realtime warmup unavailable: %s", exc)
return None

# Build the same tools list (user-defined + ``transfer_call`` /
# ``end_call``) that ``OpenAIRealtimeStreamHandler.start()`` would
# apply on a cold ``connect()``. Without this the primed
# ``session.update`` carries an empty tool list and an adopted
# parked session is silently incapable of calling the built-ins —
# ``transfer_call`` / ``end_call`` no-op until the next cold
# session.update (which never happens for adopted calls).
from getpatter.stream_handler import build_realtime_tools

adapter_kwargs: dict[str, Any] = {
"api_key": api_key,
"model": agent.model,
"voice": agent.voice,
"instructions": agent.system_prompt,
"language": agent.language,
"tools": build_realtime_tools(getattr(agent, "tools", None)),
# Twilio + Telnyx both bridge to OpenAI Realtime over
# ``g711_ulaw`` (see ``telephony/twilio.py`` / ``telnyx.py``);
# match that here so the primed session config aligns with
# the production call.
"audio_format": "g711_ulaw",
}
reasoning_effort = getattr(agent, "openai_realtime_reasoning_effort", None)
if reasoning_effort is not None:
adapter_kwargs["reasoning_effort"] = reasoning_effort
transcription_model = getattr(
agent, "openai_realtime_input_audio_transcription_model", None
)
if transcription_model is not None:
adapter_kwargs["input_audio_transcription_model"] = transcription_model
try:
return OpenAIRealtimeAdapter(**adapter_kwargs)
except Exception as exc: # noqa: BLE001 - best-effort
logger.debug("Realtime warmup adapter build failed: %s", exc)
return None

def pop_prewarmed_connections(self, call_id: str) -> dict[str, Any] | None:
"""Pop and return the parked provider WS handles for ``call_id``,
or ``None`` when no parked connections exist.
Expand Down Expand Up @@ -819,12 +898,26 @@ def _park_provider_connections(self, agent: Agent, call_id: str) -> None:
``asyncio.gather(..., return_exceptions=True)`` so a slow or
failing endpoint cannot block the others. Providers without
``open_parked_connection`` contribute nothing.

For ``openai_realtime`` mode the Realtime adapter is server-side
ephemeral, so a transient adapter is built from the resolved
Agent + the configured OpenAI key here and its
``open_parked_connection`` opens a fully primed
``session.updated`` WS that ``OpenAIRealtimeStreamHandler``
adopts at ``start`` time instead of paying the
``session.created`` + ``session.update`` round-trip again.
"""
stt = getattr(agent, "stt", None)
tts = getattr(agent, "tts", None)
stt_open = getattr(stt, "open_parked_connection", None) if stt else None
tts_open = getattr(tts, "open_parked_connection", None) if tts else None
if stt_open is None and tts_open is None:
realtime_adapter = self._build_realtime_warmup_adapter(agent)
realtime_open = (
getattr(realtime_adapter, "open_parked_connection", None)
if realtime_adapter is not None
else None
)
if stt_open is None and tts_open is None and realtime_open is None:
return

slot: dict[str, Any] = {}
Expand Down Expand Up @@ -867,8 +960,27 @@ async def _park_tts() -> None:
except Exception as exc: # noqa: BLE001 - best-effort
logger.debug("Park TTS failed for %s: %s", call_id, exc)

async def _park_realtime() -> None:
if realtime_open is None:
return
try:
handle = await realtime_open()
if self._prewarmed_connections.get(call_id) is not slot:
await _safe_close_handle(handle)
return
slot["openai_realtime"] = handle
logger.info(
"[PREWARM] callId=%s provider=openai_realtime ms=%d",
call_id,
int((time.monotonic() - started_at) * 1000),
)
except Exception as exc: # noqa: BLE001 - best-effort
logger.debug("Park Realtime failed for %s: %s", call_id, exc)

async def _run_all() -> None:
await asyncio.gather(_park_stt(), _park_tts(), return_exceptions=True)
await asyncio.gather(
_park_stt(), _park_tts(), _park_realtime(), return_exceptions=True
)

task = asyncio.create_task(_run_all())
self._prewarm_tasks.add(task)
Expand Down
Loading
Loading