Skip to content
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,30 @@

## 0.6.1 (2026-05-12)

### Changed — `StreamHandler` adopt-capability check now uses duck typing

The TS realtime adopt branch in `stream-handler.ts` previously relied on `this.adapter instanceof OpenAIRealtimeAdapter` to gate the prewarm-handoff path. Switched to a duck-type check (`typeof adapter.adoptWebSocket === 'function'`) so the generic stream-handler module stays provider-agnostic on this hot path and matches the Python handler's `getattr(self._adapter, "adopt_websocket", None)` shape. Files: `libraries/typescript/src/stream-handler.ts`.

### Fixed — Adapter state leak after a failed parked-session adoption

When `adopt_websocket` / `adoptWebSocket` raised mid-adoption, the partially-adopted `OpenAIRealtimeAdapter` was in an inconsistent state — `_running` / `messageListenerAttached` was already `true`, the heartbeat task may have started, `_current_response_item_id` / `currentResponseItemId` may have carried leaked state from the parked session, and the `_ws` / `ws` reference pointed at a now-closed socket. Falling through to `connect()` on that carcass raced `session.created` against stale state and corrupted the live call. Fix: handler now re-instantiates the adapter before the cold connect path, guaranteeing a clean slate. Files: `libraries/python/getpatter/stream_handler.py`, `libraries/typescript/src/stream-handler.ts`.

### Fixed — Eliminated double WebSocket handshake on outbound OpenAI Realtime calls

`_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) and `_park_provider_connections` / `parkProviderConnections` each built a transient `OpenAIRealtimeAdapter` and opened its own WS against `api.openai.com` during the ringing window — two handshakes per call where one suffices. The warmup-only handshake is a strict subset of what park performs (open WS → `session.created` → `session.update` → `session.updated`) and park keeps the socket open for adoption, so warmup's WS was opened, primed, and immediately discarded. Wasted 150-400 ms of ringing-window budget and doubled the rate-limit pressure against OpenAI. Fix: `_spawn_provider_warmup` no longer builds the Realtime adapter; park is the sole Realtime warm path. Pipeline-mode STT/TTS/LLM warmup is unchanged. Files: `libraries/python/getpatter/client.py`, `libraries/typescript/src/client.ts`.

### Fixed — Built-in tools (`transfer_call` / `end_call`) now land in the primed Realtime session

`_build_realtime_warmup_adapter` (Py) / `buildRealtimeWarmupAdapter` (TS) constructed the transient `OpenAIRealtimeAdapter` without a `tools=` argument, so the `session.update` sent during ringing carried an empty tool list. When `StreamHandler.start()` adopted that parked WebSocket it skipped a fresh `session.update`, leaving the upstream session permanently unaware that the two Patter built-ins existed — `transfer_call` and `end_call` silently no-op'd on every hit-prewarm call (~80% of outbound calls when prewarm is enabled). Added a shared `build_realtime_tools(...)` helper in `stream_handler.py` and `buildRealtimeTools(...)` in `server.ts` so both the live and warmup paths build the canonical tool list byte-for-byte. Files: `libraries/python/getpatter/stream_handler.py`, `libraries/python/getpatter/client.py`, `libraries/typescript/src/server.ts`, `libraries/typescript/src/client.ts`.

### Fixed — OpenAI Realtime warmup now runs during the ringing window

The `warmup()` method on `OpenAIRealtimeAdapter` (defined in both SDKs) was unreachable from `Patter.call()` — the provider warmup framework only iterated `agent.stt` / `agent.tts` / `agent.llm`, but OpenAI Realtime is an all-in-one provider that's server-instantiated at `StreamHandler.start()` time. `_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) now builds a transient `OpenAIRealtimeAdapter` from the resolved Agent + configured `openai_key` when `agent.provider == "openai_realtime"` and calls `warmup()` in parallel with the carrier `initiate_call`. Saves 150–400 ms of TLS + WebSocket handshake + `session.created` round-trip on the first turn. Files: `libraries/python/getpatter/client.py`, `libraries/typescript/src/client.ts`.

### Added — Parked OpenAI Realtime session adoption (sustained first-turn latency win across consecutive calls)

`Patter._park_provider_connections` (Py) / `Patter.parkProviderConnections` (TS) now also park a fully primed (`session.created` → `session.update` → `session.updated`) OpenAI Realtime WebSocket during the carrier ringing window when `agent.provider == "openai_realtime"`. `OpenAIRealtimeStreamHandler` (Py) and the realtime branch of `StreamHandler.initRealtimeAdapter` (TS) consult the parked slot on `start()` and call `adopt_websocket(...)` / `adoptWebSocket(...)` on the configured adapter instead of paying the cold `connect()` round-trip again — saving ~250–450 ms on the first-turn audio. Best-effort: a dead parked WS, missing OpenAI key, or `open_parked_connection` failure all fall through transparently to the cold connect path. Files: `libraries/python/getpatter/client.py`, `libraries/python/getpatter/stream_handler.py`, `libraries/python/getpatter/telephony/twilio.py`, `libraries/python/getpatter/telephony/telnyx.py`, `libraries/typescript/src/client.ts`, `libraries/typescript/src/stream-handler.ts`.

### Fixed — TypeScript `onMark` clobbered `lastConfirmedMark` with stale/unknown mark names (parity with Python)

`StreamHandler.onMark` in `libraries/typescript/src/stream-handler.ts`
Expand Down
118 changes: 115 additions & 3 deletions libraries/python/getpatter/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -733,13 +733,26 @@ def _spawn_provider_warmup(self, agent: Agent) -> None:
"""Spawn a fire-and-forget task that warms up STT / TTS / LLM in
parallel with the carrier-side ``initiate_call``.

Pipeline-mode providers (``agent.stt`` / ``agent.tts`` / ``agent.llm``)
are picked up via the optional ``warmup()`` method on each instance.

For ``openai_realtime`` mode the warmup-only handshake is a
strict subset of what :meth:`_park_provider_connections` already
performs (open WS → ``session.created`` → ``session.update`` →
``session.updated``) — and park keeps the socket open for adoption.
Running both creates a double WebSocket handshake against
``api.openai.com`` per call, wastes 150-400 ms of ringing-window
budget, and doubles the rate-limit pressure for no benefit. So
when ``agent.provider == "openai_realtime"`` we let park do all
the Realtime-side work and skip the warmup-only adapter here.

Best-effort: each provider's ``warmup()`` is wrapped in
``asyncio.gather(..., return_exceptions=True)`` so a slow or
failing endpoint cannot block the others. The default
``warmup()`` on the abstract base classes is a no-op, so providers
that don't override it contribute nothing to call latency.
"""
targets = []
targets: list[Any] = []
for provider in (
getattr(agent, "stt", None),
getattr(agent, "tts", None),
Expand All @@ -752,6 +765,14 @@ def _spawn_provider_warmup(self, agent: Agent) -> None:
continue
targets.append(provider)

# ``_build_realtime_warmup_adapter`` only fires for
# ``openai_realtime`` agents, and for those we defer 100% of the
# Realtime-side warm work to :meth:`_park_provider_connections`
# (which runs under the same ``agent.prewarm`` gate on every
# outbound call). The warmup-only handshake is a strict subset of
# what park performs, so running both makes two WS handshakes
# against ``api.openai.com`` per call instead of one.

if not targets:
return

Expand All @@ -774,6 +795,64 @@ async def _run_all() -> None:
self._prewarm_tasks.add(task)
task.add_done_callback(self._prewarm_tasks.discard)

def _build_realtime_warmup_adapter(self, agent: Agent) -> Any | None:
"""Build a transient :class:`OpenAIRealtimeAdapter` configured
identically to the one ``StreamHandler.start()`` will instantiate,
suitable for a single :py:meth:`warmup` call.

Returns ``None`` when warmup is not applicable: the agent is not
in ``openai_realtime`` mode, the OpenAI key is missing, or the
adapter import fails.
"""
if getattr(agent, "provider", None) != "openai_realtime":
return None
api_key = getattr(self._local_config, "openai_key", None)
if not api_key:
return None
try:
from getpatter.providers.openai_realtime import (
OpenAIRealtimeAdapter, # type: ignore[import]
)
except Exception as exc: # noqa: BLE001 - best-effort
logger.debug("Realtime warmup unavailable: %s", exc)
return None

# Build the same tools list (user-defined + ``transfer_call`` /
# ``end_call``) that ``OpenAIRealtimeStreamHandler.start()`` would
# apply on a cold ``connect()``. Without this the primed
# ``session.update`` carries an empty tool list and an adopted
# parked session is silently incapable of calling the built-ins —
# ``transfer_call`` / ``end_call`` no-op until the next cold
# session.update (which never happens for adopted calls).
from getpatter.stream_handler import build_realtime_tools

adapter_kwargs: dict[str, Any] = {
"api_key": api_key,
"model": agent.model,
"voice": agent.voice,
"instructions": agent.system_prompt,
"language": agent.language,
"tools": build_realtime_tools(getattr(agent, "tools", None)),
# Twilio + Telnyx both bridge to OpenAI Realtime over
# ``g711_ulaw`` (see ``telephony/twilio.py`` / ``telnyx.py``);
# match that here so the primed session config aligns with
# the production call.
"audio_format": "g711_ulaw",
}
reasoning_effort = getattr(agent, "openai_realtime_reasoning_effort", None)
if reasoning_effort is not None:
adapter_kwargs["reasoning_effort"] = reasoning_effort
transcription_model = getattr(
agent, "openai_realtime_input_audio_transcription_model", None
)
if transcription_model is not None:
adapter_kwargs["input_audio_transcription_model"] = transcription_model
try:
return OpenAIRealtimeAdapter(**adapter_kwargs)
except Exception as exc: # noqa: BLE001 - best-effort
logger.debug("Realtime warmup adapter build failed: %s", exc)
return None

def pop_prewarmed_connections(self, call_id: str) -> dict[str, Any] | None:
"""Pop and return the parked provider WS handles for ``call_id``,
or ``None`` when no parked connections exist.
Expand Down Expand Up @@ -819,12 +898,26 @@ def _park_provider_connections(self, agent: Agent, call_id: str) -> None:
``asyncio.gather(..., return_exceptions=True)`` so a slow or
failing endpoint cannot block the others. Providers without
``open_parked_connection`` contribute nothing.

For ``openai_realtime`` mode the Realtime adapter is server-side
ephemeral, so a transient adapter is built from the resolved
Agent + the configured OpenAI key here and its
``open_parked_connection`` opens a fully primed
``session.updated`` WS that ``OpenAIRealtimeStreamHandler``
adopts at ``start`` time instead of paying the
``session.created`` + ``session.update`` round-trip again.
"""
stt = getattr(agent, "stt", None)
tts = getattr(agent, "tts", None)
stt_open = getattr(stt, "open_parked_connection", None) if stt else None
tts_open = getattr(tts, "open_parked_connection", None) if tts else None
if stt_open is None and tts_open is None:
realtime_adapter = self._build_realtime_warmup_adapter(agent)
realtime_open = (
getattr(realtime_adapter, "open_parked_connection", None)
if realtime_adapter is not None
else None
)
if stt_open is None and tts_open is None and realtime_open is None:
return

slot: dict[str, Any] = {}
Expand Down Expand Up @@ -867,8 +960,27 @@ async def _park_tts() -> None:
except Exception as exc: # noqa: BLE001 - best-effort
logger.debug("Park TTS failed for %s: %s", call_id, exc)

async def _park_realtime() -> None:
if realtime_open is None:
return
try:
handle = await realtime_open()
if self._prewarmed_connections.get(call_id) is not slot:
await _safe_close_handle(handle)
return
slot["openai_realtime"] = handle
logger.info(
"[PREWARM] callId=%s provider=openai_realtime ms=%d",
call_id,
int((time.monotonic() - started_at) * 1000),
)
except Exception as exc: # noqa: BLE001 - best-effort
logger.debug("Park Realtime failed for %s: %s", call_id, exc)

async def _run_all() -> None:
await asyncio.gather(_park_stt(), _park_tts(), return_exceptions=True)
await asyncio.gather(
_park_stt(), _park_tts(), _park_realtime(), return_exceptions=True
)

task = asyncio.create_task(_run_all())
self._prewarm_tasks.add(task)
Expand Down
Loading
Loading