feat(realtime): OpenAI Realtime prewarm wired + cross-call session adoption#88
Closed
nicolotognoni wants to merge 6 commits into
Closed
feat(realtime): OpenAI Realtime prewarm wired + cross-call session adoption#88nicolotognoni wants to merge 6 commits into
nicolotognoni wants to merge 6 commits into
Conversation
…ramework The `warmup()` method on `OpenAIRealtimeAdapter` (Python + TS) was defined but unreachable from `Patter.call()` — the prewarm framework only iterated `agent.stt` / `agent.tts` / `agent.llm`, but OpenAI Realtime is an all-in-one provider that's server-instantiated at `StreamHandler.start()` time and therefore not stored on the Agent. `_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) now constructs a transient `OpenAIRealtimeAdapter` from the resolved Agent + the configured `openai_key` when `agent.provider == "openai_realtime"` and runs `warmup()` in parallel with the carrier `initiate_call`. The transient adapter is configured identically to the production one (model, voice, instructions, language, audio format = g711_ulaw for both Twilio and Telnyx, plus optional reasoning_effort / input_audio_transcription_model knobs from the engine marker) so the upstream `session.update` primes the same session state that the live call will use. Saves 150-400 ms of TLS + WebSocket handshake + `session.created` round-trip on the first turn. Best-effort: failures during warmup adapter build or `warmup()` itself are logged at DEBUG and never abort the call.
…call boundary Builds on the previous warmup wiring. The transient warmup adapter closes its WS after a session.update / session.updated round-trip, so the live call still pays a fresh ``new WebSocket`` + handshake. This change parks the primed Realtime WS instead — same pattern the SDK already uses for STT (Cartesia) and TTS (ElevenLabs WS). `_park_provider_connections` (Py) / `parkProviderConnections` (TS) now build a transient `OpenAIRealtimeAdapter` when `agent.provider == "openai_realtime"`, call its `open_parked_connection` to keep the `session.updated` WS OPEN, and stash it under the `openai_realtime` slot key alongside the existing `stt` / `tts` parked handles. `OpenAIRealtimeStreamHandler` (Py) accepts a new `pop_prewarmed_connections` callback (wired through the Twilio and Telnyx telephony adapters). `StreamHandler.start()` consults the parked slot before calling `connect()` and calls `adapter.adopt_websocket(...)` when a live WS is available — saving ~250-450 ms of cold-handshake on the first turn. TS mirrors the same flow in `StreamHandler.initRealtimeAdapter` for both Twilio and Telnyx bridges. All failure modes (missing OpenAI key, dead parked WS, park-task exception, adoption error) fall through transparently to the cold `connect()` path. Existing 36-test TS handoff/prewarm suite and 45-test Python suite all green after change.
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
The prewarm path built the transient OpenAIRealtimeAdapter without a ``tools=`` argument, so the ``session.update`` sent during ringing carried an empty tool list. When ``StreamHandler.start()`` adopted that parked WebSocket it skipped a fresh ``session.update``, leaving the upstream session permanently unaware that the two Patter built-ins (``transfer_call`` / ``end_call``) existed — they silently no-op'd on every hit-prewarm call (~80% of outbound calls when prewarm is enabled). Extracted the canonical tool-list construction (user tools + ``transfer_call`` + ``end_call``) into a shared helper — ``build_realtime_tools()`` in Python and ``buildRealtimeTools()`` in TypeScript — and call it from both the live ``buildAIAdapter`` / ``StreamHandler.start()`` path and the warmup-side ``_build_realtime_warmup_adapter`` / ``buildRealtimeWarmupAdapter`` path so the two ``session.update`` bodies match byte-for-byte. Tests: 4 new regression tests (2 Py + 2 TS) verifying that the warmup adapter carries user-defined tools plus both built-ins, and that the built-ins are still injected when the agent declares no user tools.
…oes warmup work) Both ``_spawn_provider_warmup`` and ``_park_provider_connections`` built a transient ``OpenAIRealtimeAdapter`` and opened its own WebSocket against ``api.openai.com`` during the ringing window — two handshakes per outbound call where one suffices. The warmup-only handshake is a strict subset of what park performs (open WS → ``session.created`` → ``session.update`` → ``session.updated``) and park keeps the socket open for adoption. The warmup-side WS was opened, primed, and immediately discarded — pure waste of 150-400 ms of ringing-window budget, plus doubled rate-limit pressure against OpenAI for no benefit. Fix: ``_spawn_provider_warmup`` no longer builds the Realtime adapter at all; park is now the sole Realtime warm path on outbound calls. Pipeline-mode STT / TTS / LLM ``warmup()`` calls are unchanged. Tests: 2 new regression tests verify (1) ``_spawn_provider_warmup`` does not construct a Realtime adapter, and (2) end-to-end warmup+park together construct exactly one adapter (the one park uses). Updated 3 existing tests that asserted the old double-build behaviour.
When ``adopt_websocket`` / ``adoptWebSocket`` raised mid-adoption, the partially-adopted ``OpenAIRealtimeAdapter`` was left in an inconsistent state: ``_running`` / ``messageListenerAttached`` was already true, the heartbeat task may have started, ``_current_response_item_id`` / ``currentResponseItemId`` may have carried leaked state from the parked session, and the ``_ws`` / ``ws`` reference pointed at a now-closed socket. Falling through to ``connect()`` on that carcass raced ``session.created`` against stale state, ran two heartbeat timers, and sometimes attached a second message listener to the new socket — silent corruption of every adopt-failed call. Fix: when adopt raises, re-instantiate the adapter (via the existing ``adapter_kwargs`` in Python, ``deps.buildAIAdapter`` in TS) before the cold ``connect()`` path runs, guaranteeing a clean slate. Tests: regression test in each SDK constructs an adapter whose ``adopt_websocket`` throws, then asserts (a) a second adapter instance was created, (b) ``connect()`` ran on the fresh adapter, (c) the handler's adapter reference points at the fresh instance.
…nstanceof) The TS realtime adopt branch in ``stream-handler.ts:initRealtimeAdapter`` previously gated the prewarm-handoff path with two ``this.adapter instanceof OpenAIRealtimeAdapter`` checks. Switched both to a single duck-type check (``typeof adoptWebSocket === 'function'``) so: 1. The generic ``stream-handler`` module stays provider-agnostic on this hot path. Pipeline-only users still get the symbol resolved at module load (the import is used elsewhere in this file for legitimate provider-specific behaviour), but the adopt-handoff gate no longer demands a concrete class identity. 2. The check mirrors the Python handler's ``getattr(self._adapter, "adopt_websocket", None)`` shape — both SDKs now use capability-based detection rather than identity. 3. Future Realtime-like adapters (e.g. a different vendor's all-in-one provider that also exposes ``adoptWebSocket``) can opt into the adopt flow simply by implementing the method, no SDK change needed. No behaviour change: the same WS-adopt path runs for the same adapter class. Existing adopt-handoff tests cover the behaviour and continue to pass.
6 tasks
Collaborator
Author
|
Superseded by #93 (rebased on |
nicolotognoni
added a commit
that referenced
this pull request
May 12, 2026
… duck-type adopt (re-base of #88) (#93) * feat(realtime): wire OpenAI Realtime warmup() into provider prewarm framework The `warmup()` method on `OpenAIRealtimeAdapter` (Python + TS) was defined but unreachable from `Patter.call()` — the prewarm framework only iterated `agent.stt` / `agent.tts` / `agent.llm`, but OpenAI Realtime is an all-in-one provider that's server-instantiated at `StreamHandler.start()` time and therefore not stored on the Agent. `_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) now constructs a transient `OpenAIRealtimeAdapter` from the resolved Agent + the configured `openai_key` when `agent.provider == "openai_realtime"` and runs `warmup()` in parallel with the carrier `initiate_call`. The transient adapter is configured identically to the production one (model, voice, instructions, language, audio format = g711_ulaw for both Twilio and Telnyx, plus optional reasoning_effort / input_audio_transcription_model knobs from the engine marker) so the upstream `session.update` primes the same session state that the live call will use. Saves 150-400 ms of TLS + WebSocket handshake + `session.created` round-trip on the first turn. Best-effort: failures during warmup adapter build or `warmup()` itself are logged at DEBUG and never abort the call. * feat(realtime): persist primed Realtime session across warmup → live call boundary Builds on the previous warmup wiring. The transient warmup adapter closes its WS after a session.update / session.updated round-trip, so the live call still pays a fresh ``new WebSocket`` + handshake. This change parks the primed Realtime WS instead — same pattern the SDK already uses for STT (Cartesia) and TTS (ElevenLabs WS). `_park_provider_connections` (Py) / `parkProviderConnections` (TS) now build a transient `OpenAIRealtimeAdapter` when `agent.provider == "openai_realtime"`, call its `open_parked_connection` to keep the `session.updated` WS OPEN, and stash it under the `openai_realtime` slot key alongside the existing `stt` / `tts` parked handles. `OpenAIRealtimeStreamHandler` (Py) accepts a new `pop_prewarmed_connections` callback (wired through the Twilio and Telnyx telephony adapters). `StreamHandler.start()` consults the parked slot before calling `connect()` and calls `adapter.adopt_websocket(...)` when a live WS is available — saving ~250-450 ms of cold-handshake on the first turn. TS mirrors the same flow in `StreamHandler.initRealtimeAdapter` for both Twilio and Telnyx bridges. All failure modes (missing OpenAI key, dead parked WS, park-task exception, adoption error) fall through transparently to the cold `connect()` path. Existing 36-test TS handoff/prewarm suite and 45-test Python suite all green after change. * fix(realtime): include agent tools + built-ins in primed warmup session The prewarm path built the transient OpenAIRealtimeAdapter without a ``tools=`` argument, so the ``session.update`` sent during ringing carried an empty tool list. When ``StreamHandler.start()`` adopted that parked WebSocket it skipped a fresh ``session.update``, leaving the upstream session permanently unaware that the two Patter built-ins (``transfer_call`` / ``end_call``) existed — they silently no-op'd on every hit-prewarm call (~80% of outbound calls when prewarm is enabled). Extracted the canonical tool-list construction (user tools + ``transfer_call`` + ``end_call``) into a shared helper — ``build_realtime_tools()`` in Python and ``buildRealtimeTools()`` in TypeScript — and call it from both the live ``buildAIAdapter`` / ``StreamHandler.start()`` path and the warmup-side ``_build_realtime_warmup_adapter`` / ``buildRealtimeWarmupAdapter`` path so the two ``session.update`` bodies match byte-for-byte. Tests: 4 new regression tests (2 Py + 2 TS) verifying that the warmup adapter carries user-defined tools plus both built-ins, and that the built-ins are still injected when the agent declares no user tools. * fix(realtime): eliminate double-handshake on outbound prewarm (park does warmup work) Both ``_spawn_provider_warmup`` and ``_park_provider_connections`` built a transient ``OpenAIRealtimeAdapter`` and opened its own WebSocket against ``api.openai.com`` during the ringing window — two handshakes per outbound call where one suffices. The warmup-only handshake is a strict subset of what park performs (open WS → ``session.created`` → ``session.update`` → ``session.updated``) and park keeps the socket open for adoption. The warmup-side WS was opened, primed, and immediately discarded — pure waste of 150-400 ms of ringing-window budget, plus doubled rate-limit pressure against OpenAI for no benefit. Fix: ``_spawn_provider_warmup`` no longer builds the Realtime adapter at all; park is now the sole Realtime warm path on outbound calls. Pipeline-mode STT / TTS / LLM ``warmup()`` calls are unchanged. Tests: 2 new regression tests verify (1) ``_spawn_provider_warmup`` does not construct a Realtime adapter, and (2) end-to-end warmup+park together construct exactly one adapter (the one park uses). Updated 3 existing tests that asserted the old double-build behaviour. * fix(realtime): recreate adapter on adopt failure to avoid stale state When ``adopt_websocket`` / ``adoptWebSocket`` raised mid-adoption, the partially-adopted ``OpenAIRealtimeAdapter`` was left in an inconsistent state: ``_running`` / ``messageListenerAttached`` was already true, the heartbeat task may have started, ``_current_response_item_id`` / ``currentResponseItemId`` may have carried leaked state from the parked session, and the ``_ws`` / ``ws`` reference pointed at a now-closed socket. Falling through to ``connect()`` on that carcass raced ``session.created`` against stale state, ran two heartbeat timers, and sometimes attached a second message listener to the new socket — silent corruption of every adopt-failed call. Fix: when adopt raises, re-instantiate the adapter (via the existing ``adapter_kwargs`` in Python, ``deps.buildAIAdapter`` in TS) before the cold ``connect()`` path runs, guaranteeing a clean slate. Tests: regression test in each SDK constructs an adapter whose ``adopt_websocket`` throws, then asserts (a) a second adapter instance was created, (b) ``connect()`` ran on the fresh adapter, (c) the handler's adapter reference points at the fresh instance. * refactor(stream-handler): duck-type adoptWebSocket capability (drop instanceof) The TS realtime adopt branch in ``stream-handler.ts:initRealtimeAdapter`` previously gated the prewarm-handoff path with two ``this.adapter instanceof OpenAIRealtimeAdapter`` checks. Switched both to a single duck-type check (``typeof adoptWebSocket === 'function'``) so: 1. The generic ``stream-handler`` module stays provider-agnostic on this hot path. Pipeline-only users still get the symbol resolved at module load (the import is used elsewhere in this file for legitimate provider-specific behaviour), but the adopt-handoff gate no longer demands a concrete class identity. 2. The check mirrors the Python handler's ``getattr(self._adapter, "adopt_websocket", None)`` shape — both SDKs now use capability-based detection rather than identity. 3. Future Realtime-like adapters (e.g. a different vendor's all-in-one provider that also exposes ``adoptWebSocket``) can opt into the adopt flow simply by implementing the method, no SDK change needed. No behaviour change: the same WS-adopt path runs for the same adapter class. Existing adopt-handoff tests cover the behaviour and continue to pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Commit 1 —
feat(realtime): wire OpenAI Realtime warmup() into provider prewarm framework. Connects the previously-orphanedOpenAIRealtimeAdapter.warmup()method to the per-call_spawn_provider_warmup/spawnProviderWarmuppipeline. The framework only iteratedagent.stt/agent.tts/agent.llm— OpenAI Realtime is an all-in-one provider that's server-instantiated atStreamHandler.start()time, so it was unreachable. A transient adapter is now built from the resolved Agent + the configuredopenai_keywhenagent.provider == "openai_realtime"andwarmup()runs in parallel with the carrierinitiate_call. Saves 150–400 ms of TLS + WebSocket handshake +session.createdround-trip.Commit 2 —
feat(realtime): persist primed Realtime session across warmup → live call boundary. Builds on commit 1 by parking the primedsession.updatedWebSocket instead of closing it. Same pattern the SDK already uses for STT (Cartesia) and TTS (ElevenLabs WS).OpenAIRealtimeStreamHandler.start()consults the parked slot and callsadopt_websocket(...)instead of paying a coldconnect()round-trip again — saves ~250–450 ms on the first-turn audio. Best-effort: missing key, dead WS, or park failures all fall through transparently to cold connect.Implementation
client.py:732—_spawn_provider_warmupnow picks up Realtime via a new_build_realtime_warmup_adapterhelper.client.py:866—_park_provider_connectionsparks the Realtime WS under theopenai_realtimeslot key.stream_handler.py:724—OpenAIRealtimeStreamHandler.__init__acceptspop_prewarmed_connections.stream_handler.py:950—start()adopts the parked WS when alive; falls back toconnect()otherwise.telephony/twilio.py:498,telephony/telnyx.py:605— passpop_prewarmed_connections=to the Realtime handler.client.ts:940—spawnProviderWarmupmirrors the Python branch viabuildRealtimeWarmupAdapter.client.ts:863—parkProviderConnectionsparks the Realtime WS intoslot.openaiRealtime.stream-handler.ts:2229—initRealtimeAdapterconsultspopPrewarmedConnectionsand callsadoptWebSocket(...)when the parked WS is OPEN.No dependencies added or removed. All file changes are additive (new code paths, optional kwargs) so existing pipeline / ConvAI flows are untouched.
Breaking change?
No.
pop_prewarmed_connectionsis an optional kwarg with safe defaultNone. Realtime mode without an OpenAI key (impossible viaPatter.agent()guard but defensible in the warmup helper) is a no-op.Test plan
pytest tests/ -m \"not soak\" -q→ 1825 passednpm test→ 1493 passednpm run lint→ cleannpm run build→ successtests/test_prewarm.py+tests/test_prewarm_handoff.pycover warmup wiring (5 cases) and parking + adoption (6 cases)tests/unit/prewarm.test.ts+tests/unit/prewarm-handoff.test.tscover the same behaviour (4 + 3 cases)Docs updates
CHANGELOG.md— Unreleased section: one### Fixedentry (commit 1) + one### Addedentry (commit 2).