PatterAI · nicolotognoni · May 12, 2026 · May 12, 2026 · May 12, 2026 · May 12, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,23 @@
+## Unreleased
+
+### Changed
+
+- **`StreamHandler` adopt-capability check now uses duck typing.** The TS realtime adopt branch in `stream-handler.ts:2229` previously relied on `this.adapter instanceof OpenAIRealtimeAdapter` to gate the prewarm-handoff path. Switched to a duck-type check (`typeof adapter.adoptWebSocket === 'function'`) so the generic stream-handler module stays provider-agnostic on this hot path and matches the Python handler's `getattr(self._adapter, "adopt_websocket", None)` shape. Files: `libraries/typescript/src/stream-handler.ts:2229`.
+
+### Fixed
+
+- **Adapter state leak after a failed parked-session adoption.** When `adopt_websocket` / `adoptWebSocket` raised mid-adoption, the partially-adopted `OpenAIRealtimeAdapter` was in an inconsistent state — `_running` / `messageListenerAttached` was already `true`, the heartbeat task may have started, `_current_response_item_id` / `currentResponseItemId` may have carried leaked state from the parked session, and the `_ws` / `ws` reference pointed at a now-closed socket. Falling through to `connect()` on that carcass raced `session.created` against stale state and corrupted the live call. Fix: handler now re-instantiates the adapter before the cold connect path, guaranteeing a clean slate. Files: `libraries/python/getpatter/stream_handler.py:998`, `libraries/typescript/src/stream-handler.ts:2229`.
+
+- **Eliminated double WebSocket handshake on outbound OpenAI Realtime calls.** `_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) and `_park_provider_connections` / `parkProviderConnections` each built a transient `OpenAIRealtimeAdapter` and opened its own WS against `api.openai.com` during the ringing window — two handshakes per call where one suffices. The warmup-only handshake is a strict subset of what park performs (open WS → `session.created` → `session.update` → `session.updated`) and park keeps the socket open for adoption, so warmup's WS was opened, primed, and immediately discarded. Wasted 150-400 ms of ringing-window budget and doubled the rate-limit pressure against OpenAI. Fix: `_spawn_provider_warmup` no longer builds the Realtime adapter; park is the sole Realtime warm path. Pipeline-mode STT/TTS/LLM warmup is unchanged. Files: `libraries/python/getpatter/client.py:732`, `libraries/typescript/src/client.ts:982`.
+
+- **Built-in tools (`transfer_call` / `end_call`) now land in the primed Realtime session.** `_build_realtime_warmup_adapter` (Py) / `buildRealtimeWarmupAdapter` (TS) constructed the transient `OpenAIRealtimeAdapter` without a `tools=` argument, so the `session.update` sent during ringing carried an empty tool list. When `StreamHandler.start()` adopted that parked WebSocket it skipped a fresh `session.update`, leaving the upstream session permanently unaware that the two Patter built-ins existed — `transfer_call` and `end_call` silently no-op'd on every hit-prewarm call (~80% of outbound calls when prewarm is enabled). Added a shared `build_realtime_tools(...)` helper in `stream_handler.py` and `buildRealtimeTools(...)` in `server.ts` so both the live and warmup paths build the canonical tool list byte-for-byte. Files: `libraries/python/getpatter/stream_handler.py:91`, `libraries/python/getpatter/client.py:790`, `libraries/typescript/src/server.ts:62`, `libraries/typescript/src/client.ts:1030`.
+
+- **OpenAI Realtime warmup now runs during the ringing window.** The `warmup()` method on `OpenAIRealtimeAdapter` (defined in both SDKs) was unreachable from `Patter.call()` — the provider warmup framework only iterated `agent.stt` / `agent.tts` / `agent.llm`, but OpenAI Realtime is an all-in-one provider that's server-instantiated at `StreamHandler.start()` time. `_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) now builds a transient `OpenAIRealtimeAdapter` from the resolved Agent + configured `openai_key` when `agent.provider == "openai_realtime"` and calls `warmup()` in parallel with the carrier `initiate_call`. Saves 150–400 ms of TLS + WebSocket handshake + `session.created` round-trip on the first turn. Files: `libraries/python/getpatter/client.py:732`, `libraries/typescript/src/client.ts:940`.
+
+### Added
+
+- **Parked OpenAI Realtime session adoption — sustained first-turn latency win across consecutive calls.** `Patter._park_provider_connections` (Py) / `Patter.parkProviderConnections` (TS) now also park a fully primed (`session.created` → `session.update` → `session.updated`) OpenAI Realtime WebSocket during the carrier ringing window when `agent.provider == "openai_realtime"`. `OpenAIRealtimeStreamHandler` (Py) and the realtime branch of `StreamHandler.initRealtimeAdapter` (TS) consult the parked slot on `start()` and call `adopt_websocket(...)` / `adoptWebSocket(...)` on the configured adapter instead of paying the cold `connect()` round-trip again — saving ~250–450 ms on the first-turn audio. Best-effort: a dead parked WS, missing OpenAI key, or `open_parked_connection` failure all fall through transparently to the cold connect path. Files: `libraries/python/getpatter/client.py:866`, `libraries/python/getpatter/stream_handler.py:724,950`, `libraries/python/getpatter/telephony/twilio.py:498`, `libraries/python/getpatter/telephony/telnyx.py:605`, `libraries/typescript/src/client.ts:863`, `libraries/typescript/src/stream-handler.ts:2229`.
+
 ## 0.6.1 (2026-05-09)
 
 ### Fixed — Barge-in bug bundle: 6.8s latency outliers, double-talk dispatch, stale anchors, firstMessage uninterruptible (Python + TypeScript parity)

diff --git a/libraries/python/getpatter/client.py b/libraries/python/getpatter/client.py
@@ -733,13 +733,26 @@ def _spawn_provider_warmup(self, agent: Agent) -> None:
         """Spawn a fire-and-forget task that warms up STT / TTS / LLM in
         parallel with the carrier-side ``initiate_call``.
 
+        Pipeline-mode providers (``agent.stt`` / ``agent.tts`` / ``agent.llm``)
+        are picked up via the optional ``warmup()`` method on each instance.
+
+        For ``openai_realtime`` mode the warmup-only handshake is a
+        strict subset of what :meth:`_park_provider_connections` already
+        performs (open WS → ``session.created`` → ``session.update`` →
+        ``session.updated``) — and park keeps the socket open for adoption.
+        Running both creates a double WebSocket handshake against
+        ``api.openai.com`` per call, wastes 150-400 ms of ringing-window
+        budget, and doubles the rate-limit pressure for no benefit. So
+        when ``agent.provider == "openai_realtime"`` we let park do all
+        the Realtime-side work and skip the warmup-only adapter here.
+
         Best-effort: each provider's ``warmup()`` is wrapped in
         ``asyncio.gather(..., return_exceptions=True)`` so a slow or
         failing endpoint cannot block the others. The default
         ``warmup()`` on the abstract base classes is a no-op, so providers
         that don't override it contribute nothing to call latency.
         """
-        targets = []
+        targets: list[Any] = []
         for provider in (
             getattr(agent, "stt", None),
             getattr(agent, "tts", None),
@@ -752,6 +765,14 @@ def _spawn_provider_warmup(self, agent: Agent) -> None:
                 continue
             targets.append(provider)
 
+        # ``_build_realtime_warmup_adapter`` only fires for
+        # ``openai_realtime`` agents, and for those we defer 100% of the
+        # Realtime-side warm work to :meth:`_park_provider_connections`
+        # (which runs under the same ``agent.prewarm`` gate on every
+        # outbound call). The warmup-only handshake is a strict subset of
+        # what park performs, so running both makes two WS handshakes
+        # against ``api.openai.com`` per call instead of one.
+
         if not targets:
             return
 
@@ -774,6 +795,64 @@ async def _run_all() -> None:
         self._prewarm_tasks.add(task)
         task.add_done_callback(self._prewarm_tasks.discard)
 
+    def _build_realtime_warmup_adapter(self, agent: Agent) -> Any | None:
+        """Build a transient :class:`OpenAIRealtimeAdapter` configured
+        identically to the one ``StreamHandler.start()`` will instantiate,
+        suitable for a single :py:meth:`warmup` call.
+
+        Returns ``None`` when warmup is not applicable: the agent is not
+        in ``openai_realtime`` mode, the OpenAI key is missing, or the
+        adapter import fails.
+        """
+        if getattr(agent, "provider", None) != "openai_realtime":
+            return None
+        api_key = getattr(self._local_config, "openai_key", None)
+        if not api_key:
+            return None
+        try:
+            from getpatter.providers.openai_realtime import (
+                OpenAIRealtimeAdapter,  # type: ignore[import]
+            )
+        except Exception as exc:  # noqa: BLE001 - best-effort
+            logger.debug("Realtime warmup unavailable: %s", exc)
+            return None
+
+        # Build the same tools list (user-defined + ``transfer_call`` /
+        # ``end_call``) that ``OpenAIRealtimeStreamHandler.start()`` would
+        # apply on a cold ``connect()``. Without this the primed
+        # ``session.update`` carries an empty tool list and an adopted
+        # parked session is silently incapable of calling the built-ins —
+        # ``transfer_call`` / ``end_call`` no-op until the next cold
+        # session.update (which never happens for adopted calls).
+        from getpatter.stream_handler import build_realtime_tools
+
+        adapter_kwargs: dict[str, Any] = {
+            "api_key": api_key,
+            "model": agent.model,
+            "voice": agent.voice,
+            "instructions": agent.system_prompt,
+            "language": agent.language,
+            "tools": build_realtime_tools(getattr(agent, "tools", None)),
+            # Twilio + Telnyx both bridge to OpenAI Realtime over
+            # ``g711_ulaw`` (see ``telephony/twilio.py`` / ``telnyx.py``);
+            # match that here so the primed session config aligns with
+            # the production call.
+            "audio_format": "g711_ulaw",
+        }
+        reasoning_effort = getattr(agent, "openai_realtime_reasoning_effort", None)
+        if reasoning_effort is not None:
+            adapter_kwargs["reasoning_effort"] = reasoning_effort
+        transcription_model = getattr(
+            agent, "openai_realtime_input_audio_transcription_model", None
+        )
+        if transcription_model is not None:
+            adapter_kwargs["input_audio_transcription_model"] = transcription_model
+        try:
+            return OpenAIRealtimeAdapter(**adapter_kwargs)
+        except Exception as exc:  # noqa: BLE001 - best-effort
+            logger.debug("Realtime warmup adapter build failed: %s", exc)
+            return None
+
     def pop_prewarmed_connections(self, call_id: str) -> dict[str, Any] | None:
         """Pop and return the parked provider WS handles for ``call_id``,
         or ``None`` when no parked connections exist.
@@ -819,12 +898,26 @@ def _park_provider_connections(self, agent: Agent, call_id: str) -> None:
         ``asyncio.gather(..., return_exceptions=True)`` so a slow or
         failing endpoint cannot block the others. Providers without
         ``open_parked_connection`` contribute nothing.
+
+        For ``openai_realtime`` mode the Realtime adapter is server-side
+        ephemeral, so a transient adapter is built from the resolved
+        Agent + the configured OpenAI key here and its
+        ``open_parked_connection`` opens a fully primed
+        ``session.updated`` WS that ``OpenAIRealtimeStreamHandler``
+        adopts at ``start`` time instead of paying the
+        ``session.created`` + ``session.update`` round-trip again.
         """
         stt = getattr(agent, "stt", None)
         tts = getattr(agent, "tts", None)
         stt_open = getattr(stt, "open_parked_connection", None) if stt else None
         tts_open = getattr(tts, "open_parked_connection", None) if tts else None
-        if stt_open is None and tts_open is None:
+        realtime_adapter = self._build_realtime_warmup_adapter(agent)
+        realtime_open = (
+            getattr(realtime_adapter, "open_parked_connection", None)
+            if realtime_adapter is not None
+            else None
+        )
+        if stt_open is None and tts_open is None and realtime_open is None:
             return
 
         slot: dict[str, Any] = {}
@@ -867,8 +960,27 @@ async def _park_tts() -> None:
             except Exception as exc:  # noqa: BLE001 - best-effort
                 logger.debug("Park TTS failed for %s: %s", call_id, exc)
 
+        async def _park_realtime() -> None:
+            if realtime_open is None:
+                return
+            try:
+                handle = await realtime_open()
+                if self._prewarmed_connections.get(call_id) is not slot:
+                    await _safe_close_handle(handle)
+                    return
+                slot["openai_realtime"] = handle
+                logger.info(
+                    "[PREWARM] callId=%s provider=openai_realtime ms=%d",
+                    call_id,
+                    int((time.monotonic() - started_at) * 1000),
+                )
+            except Exception as exc:  # noqa: BLE001 - best-effort
+                logger.debug("Park Realtime failed for %s: %s", call_id, exc)
+
         async def _run_all() -> None:
-            await asyncio.gather(_park_stt(), _park_tts(), return_exceptions=True)
+            await asyncio.gather(
+                _park_stt(), _park_tts(), _park_realtime(), return_exceptions=True
+            )
 
         task = asyncio.create_task(_run_all())
         self._prewarm_tasks.add(task)