Skip to content

feat(realtime): OpenAI Realtime prewarm wired + cross-call session adoption#88

Closed
nicolotognoni wants to merge 6 commits into
feat/observability-otel-attrs-0.6.1from
feat/0.6.2-realtime-prewarm
Closed

feat(realtime): OpenAI Realtime prewarm wired + cross-call session adoption#88
nicolotognoni wants to merge 6 commits into
feat/observability-otel-attrs-0.6.1from
feat/0.6.2-realtime-prewarm

Conversation

@nicolotognoni
Copy link
Copy Markdown
Collaborator

Summary

  • Commit 1 — feat(realtime): wire OpenAI Realtime warmup() into provider prewarm framework. Connects the previously-orphaned OpenAIRealtimeAdapter.warmup() method to the per-call _spawn_provider_warmup / spawnProviderWarmup pipeline. The framework only iterated agent.stt / agent.tts / agent.llm — OpenAI Realtime is an all-in-one provider that's server-instantiated at StreamHandler.start() time, so it was unreachable. A transient adapter is now built from the resolved Agent + the configured openai_key when agent.provider == "openai_realtime" and warmup() runs in parallel with the carrier initiate_call. Saves 150–400 ms of TLS + WebSocket handshake + session.created round-trip.

  • Commit 2 — feat(realtime): persist primed Realtime session across warmup → live call boundary. Builds on commit 1 by parking the primed session.updated WebSocket instead of closing it. Same pattern the SDK already uses for STT (Cartesia) and TTS (ElevenLabs WS). OpenAIRealtimeStreamHandler.start() consults the parked slot and calls adopt_websocket(...) instead of paying a cold connect() round-trip again — saves ~250–450 ms on the first-turn audio. Best-effort: missing key, dead WS, or park failures all fall through transparently to cold connect.

Implementation

  • Python
    • client.py:732_spawn_provider_warmup now picks up Realtime via a new _build_realtime_warmup_adapter helper.
    • client.py:866_park_provider_connections parks the Realtime WS under the openai_realtime slot key.
    • stream_handler.py:724OpenAIRealtimeStreamHandler.__init__ accepts pop_prewarmed_connections.
    • stream_handler.py:950start() adopts the parked WS when alive; falls back to connect() otherwise.
    • telephony/twilio.py:498, telephony/telnyx.py:605 — pass pop_prewarmed_connections= to the Realtime handler.
  • TypeScript
    • client.ts:940spawnProviderWarmup mirrors the Python branch via buildRealtimeWarmupAdapter.
    • client.ts:863parkProviderConnections parks the Realtime WS into slot.openaiRealtime.
    • stream-handler.ts:2229initRealtimeAdapter consults popPrewarmedConnections and calls adoptWebSocket(...) when the parked WS is OPEN.

No dependencies added or removed. All file changes are additive (new code paths, optional kwargs) so existing pipeline / ConvAI flows are untouched.

Breaking change?

No. pop_prewarmed_connections is an optional kwarg with safe default None. Realtime mode without an OpenAI key (impossible via Patter.agent() guard but defensible in the warmup helper) is a no-op.

Test plan

  • Python: pytest tests/ -m \"not soak\" -q → 1825 passed
  • TypeScript: npm test → 1493 passed
  • TypeScript: npm run lint → clean
  • TypeScript: npm run build → success
  • New Python tests: tests/test_prewarm.py + tests/test_prewarm_handoff.py cover warmup wiring (5 cases) and parking + adoption (6 cases)
  • New TS tests: tests/unit/prewarm.test.ts + tests/unit/prewarm-handoff.test.ts cover the same behaviour (4 + 3 cases)

Docs updates

  • CHANGELOG.md — Unreleased section: one ### Fixed entry (commit 1) + one ### Added entry (commit 2).

…ramework

The `warmup()` method on `OpenAIRealtimeAdapter` (Python + TS) was
defined but unreachable from `Patter.call()` — the prewarm framework
only iterated `agent.stt` / `agent.tts` / `agent.llm`, but OpenAI
Realtime is an all-in-one provider that's server-instantiated at
`StreamHandler.start()` time and therefore not stored on the Agent.

`_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) now
constructs a transient `OpenAIRealtimeAdapter` from the resolved
Agent + the configured `openai_key` when `agent.provider ==
"openai_realtime"` and runs `warmup()` in parallel with the carrier
`initiate_call`. The transient adapter is configured identically to
the production one (model, voice, instructions, language, audio
format = g711_ulaw for both Twilio and Telnyx, plus optional
reasoning_effort / input_audio_transcription_model knobs from the
engine marker) so the upstream `session.update` primes the same
session state that the live call will use.

Saves 150-400 ms of TLS + WebSocket handshake + `session.created`
round-trip on the first turn. Best-effort: failures during warmup
adapter build or `warmup()` itself are logged at DEBUG and never
abort the call.
…call boundary

Builds on the previous warmup wiring. The transient warmup adapter
closes its WS after a session.update / session.updated round-trip,
so the live call still pays a fresh ``new WebSocket`` + handshake.
This change parks the primed Realtime WS instead — same pattern the
SDK already uses for STT (Cartesia) and TTS (ElevenLabs WS).

`_park_provider_connections` (Py) / `parkProviderConnections` (TS)
now build a transient `OpenAIRealtimeAdapter` when
`agent.provider == "openai_realtime"`, call its
`open_parked_connection` to keep the `session.updated` WS OPEN,
and stash it under the `openai_realtime` slot key alongside the
existing `stt` / `tts` parked handles.

`OpenAIRealtimeStreamHandler` (Py) accepts a new
`pop_prewarmed_connections` callback (wired through the Twilio and
Telnyx telephony adapters). `StreamHandler.start()` consults the
parked slot before calling `connect()` and calls
`adapter.adopt_websocket(...)` when a live WS is available — saving
~250-450 ms of cold-handshake on the first turn. TS mirrors the same
flow in `StreamHandler.initRealtimeAdapter` for both Twilio and
Telnyx bridges.

All failure modes (missing OpenAI key, dead parked WS, park-task
exception, adoption error) fall through transparently to the cold
`connect()` path. Existing 36-test TS handoff/prewarm suite and
45-test Python suite all green after change.
@mintlify
Copy link
Copy Markdown

mintlify Bot commented May 12, 2026

Preview deployment for your docs. Learn more about Mintlify Previews.

Project Status Preview Updated (UTC)
patter-06b046ce 🟢 Ready View Preview May 12, 2026, 11:39 AM

💡 Tip: Enable Workflows to automatically generate PRs for you.

@nicolotognoni nicolotognoni changed the base branch from main to feat/observability-otel-attrs-0.6.1 May 12, 2026 12:19
The prewarm path built the transient OpenAIRealtimeAdapter without a
``tools=`` argument, so the ``session.update`` sent during ringing
carried an empty tool list. When ``StreamHandler.start()`` adopted that
parked WebSocket it skipped a fresh ``session.update``, leaving the
upstream session permanently unaware that the two Patter built-ins
(``transfer_call`` / ``end_call``) existed — they silently no-op'd on
every hit-prewarm call (~80% of outbound calls when prewarm is enabled).

Extracted the canonical tool-list construction (user tools +
``transfer_call`` + ``end_call``) into a shared helper —
``build_realtime_tools()`` in Python and ``buildRealtimeTools()`` in
TypeScript — and call it from both the live ``buildAIAdapter`` /
``StreamHandler.start()`` path and the warmup-side
``_build_realtime_warmup_adapter`` / ``buildRealtimeWarmupAdapter``
path so the two ``session.update`` bodies match byte-for-byte.

Tests: 4 new regression tests (2 Py + 2 TS) verifying that the warmup
adapter carries user-defined tools plus both built-ins, and that the
built-ins are still injected when the agent declares no user tools.
…oes warmup work)

Both ``_spawn_provider_warmup`` and ``_park_provider_connections`` built
a transient ``OpenAIRealtimeAdapter`` and opened its own WebSocket
against ``api.openai.com`` during the ringing window — two handshakes
per outbound call where one suffices.

The warmup-only handshake is a strict subset of what park performs
(open WS → ``session.created`` → ``session.update`` → ``session.updated``)
and park keeps the socket open for adoption. The warmup-side WS was
opened, primed, and immediately discarded — pure waste of 150-400 ms
of ringing-window budget, plus doubled rate-limit pressure against
OpenAI for no benefit.

Fix: ``_spawn_provider_warmup`` no longer builds the Realtime adapter
at all; park is now the sole Realtime warm path on outbound calls.
Pipeline-mode STT / TTS / LLM ``warmup()`` calls are unchanged.

Tests: 2 new regression tests verify (1) ``_spawn_provider_warmup``
does not construct a Realtime adapter, and (2) end-to-end
warmup+park together construct exactly one adapter (the one park uses).
Updated 3 existing tests that asserted the old double-build behaviour.
When ``adopt_websocket`` / ``adoptWebSocket`` raised mid-adoption, the
partially-adopted ``OpenAIRealtimeAdapter`` was left in an inconsistent
state: ``_running`` / ``messageListenerAttached`` was already true, the
heartbeat task may have started, ``_current_response_item_id`` /
``currentResponseItemId`` may have carried leaked state from the parked
session, and the ``_ws`` / ``ws`` reference pointed at a now-closed
socket.

Falling through to ``connect()`` on that carcass raced
``session.created`` against stale state, ran two heartbeat timers, and
sometimes attached a second message listener to the new socket — silent
corruption of every adopt-failed call.

Fix: when adopt raises, re-instantiate the adapter (via the existing
``adapter_kwargs`` in Python, ``deps.buildAIAdapter`` in TS) before the
cold ``connect()`` path runs, guaranteeing a clean slate.

Tests: regression test in each SDK constructs an adapter whose
``adopt_websocket`` throws, then asserts (a) a second adapter instance
was created, (b) ``connect()`` ran on the fresh adapter, (c) the
handler's adapter reference points at the fresh instance.
…nstanceof)

The TS realtime adopt branch in ``stream-handler.ts:initRealtimeAdapter``
previously gated the prewarm-handoff path with two
``this.adapter instanceof OpenAIRealtimeAdapter`` checks. Switched both
to a single duck-type check (``typeof adoptWebSocket === 'function'``)
so:

1. The generic ``stream-handler`` module stays provider-agnostic on this
   hot path. Pipeline-only users still get the symbol resolved at module
   load (the import is used elsewhere in this file for legitimate
   provider-specific behaviour), but the adopt-handoff gate no longer
   demands a concrete class identity.

2. The check mirrors the Python handler's
   ``getattr(self._adapter, "adopt_websocket", None)`` shape — both
   SDKs now use capability-based detection rather than identity.

3. Future Realtime-like adapters (e.g. a different vendor's all-in-one
   provider that also exposes ``adoptWebSocket``) can opt into the
   adopt flow simply by implementing the method, no SDK change needed.

No behaviour change: the same WS-adopt path runs for the same adapter
class. Existing adopt-handoff tests cover the behaviour and continue
to pass.
@nicolotognoni
Copy link
Copy Markdown
Collaborator Author

Superseded by #93 (rebased on feat/observability-otel-attrs-0.6.1 HEAD 893a3bb with CHANGELOG + test_prewarm conflicts resolved; no force-push per repo convention).

nicolotognoni added a commit that referenced this pull request May 12, 2026
… duck-type adopt (re-base of #88) (#93)

* feat(realtime): wire OpenAI Realtime warmup() into provider prewarm framework

The `warmup()` method on `OpenAIRealtimeAdapter` (Python + TS) was
defined but unreachable from `Patter.call()` — the prewarm framework
only iterated `agent.stt` / `agent.tts` / `agent.llm`, but OpenAI
Realtime is an all-in-one provider that's server-instantiated at
`StreamHandler.start()` time and therefore not stored on the Agent.

`_spawn_provider_warmup` (Py) / `spawnProviderWarmup` (TS) now
constructs a transient `OpenAIRealtimeAdapter` from the resolved
Agent + the configured `openai_key` when `agent.provider ==
"openai_realtime"` and runs `warmup()` in parallel with the carrier
`initiate_call`. The transient adapter is configured identically to
the production one (model, voice, instructions, language, audio
format = g711_ulaw for both Twilio and Telnyx, plus optional
reasoning_effort / input_audio_transcription_model knobs from the
engine marker) so the upstream `session.update` primes the same
session state that the live call will use.

Saves 150-400 ms of TLS + WebSocket handshake + `session.created`
round-trip on the first turn. Best-effort: failures during warmup
adapter build or `warmup()` itself are logged at DEBUG and never
abort the call.

* feat(realtime): persist primed Realtime session across warmup → live call boundary

Builds on the previous warmup wiring. The transient warmup adapter
closes its WS after a session.update / session.updated round-trip,
so the live call still pays a fresh ``new WebSocket`` + handshake.
This change parks the primed Realtime WS instead — same pattern the
SDK already uses for STT (Cartesia) and TTS (ElevenLabs WS).

`_park_provider_connections` (Py) / `parkProviderConnections` (TS)
now build a transient `OpenAIRealtimeAdapter` when
`agent.provider == "openai_realtime"`, call its
`open_parked_connection` to keep the `session.updated` WS OPEN,
and stash it under the `openai_realtime` slot key alongside the
existing `stt` / `tts` parked handles.

`OpenAIRealtimeStreamHandler` (Py) accepts a new
`pop_prewarmed_connections` callback (wired through the Twilio and
Telnyx telephony adapters). `StreamHandler.start()` consults the
parked slot before calling `connect()` and calls
`adapter.adopt_websocket(...)` when a live WS is available — saving
~250-450 ms of cold-handshake on the first turn. TS mirrors the same
flow in `StreamHandler.initRealtimeAdapter` for both Twilio and
Telnyx bridges.

All failure modes (missing OpenAI key, dead parked WS, park-task
exception, adoption error) fall through transparently to the cold
`connect()` path. Existing 36-test TS handoff/prewarm suite and
45-test Python suite all green after change.

* fix(realtime): include agent tools + built-ins in primed warmup session

The prewarm path built the transient OpenAIRealtimeAdapter without a
``tools=`` argument, so the ``session.update`` sent during ringing
carried an empty tool list. When ``StreamHandler.start()`` adopted that
parked WebSocket it skipped a fresh ``session.update``, leaving the
upstream session permanently unaware that the two Patter built-ins
(``transfer_call`` / ``end_call``) existed — they silently no-op'd on
every hit-prewarm call (~80% of outbound calls when prewarm is enabled).

Extracted the canonical tool-list construction (user tools +
``transfer_call`` + ``end_call``) into a shared helper —
``build_realtime_tools()`` in Python and ``buildRealtimeTools()`` in
TypeScript — and call it from both the live ``buildAIAdapter`` /
``StreamHandler.start()`` path and the warmup-side
``_build_realtime_warmup_adapter`` / ``buildRealtimeWarmupAdapter``
path so the two ``session.update`` bodies match byte-for-byte.

Tests: 4 new regression tests (2 Py + 2 TS) verifying that the warmup
adapter carries user-defined tools plus both built-ins, and that the
built-ins are still injected when the agent declares no user tools.

* fix(realtime): eliminate double-handshake on outbound prewarm (park does warmup work)

Both ``_spawn_provider_warmup`` and ``_park_provider_connections`` built
a transient ``OpenAIRealtimeAdapter`` and opened its own WebSocket
against ``api.openai.com`` during the ringing window — two handshakes
per outbound call where one suffices.

The warmup-only handshake is a strict subset of what park performs
(open WS → ``session.created`` → ``session.update`` → ``session.updated``)
and park keeps the socket open for adoption. The warmup-side WS was
opened, primed, and immediately discarded — pure waste of 150-400 ms
of ringing-window budget, plus doubled rate-limit pressure against
OpenAI for no benefit.

Fix: ``_spawn_provider_warmup`` no longer builds the Realtime adapter
at all; park is now the sole Realtime warm path on outbound calls.
Pipeline-mode STT / TTS / LLM ``warmup()`` calls are unchanged.

Tests: 2 new regression tests verify (1) ``_spawn_provider_warmup``
does not construct a Realtime adapter, and (2) end-to-end
warmup+park together construct exactly one adapter (the one park uses).
Updated 3 existing tests that asserted the old double-build behaviour.

* fix(realtime): recreate adapter on adopt failure to avoid stale state

When ``adopt_websocket`` / ``adoptWebSocket`` raised mid-adoption, the
partially-adopted ``OpenAIRealtimeAdapter`` was left in an inconsistent
state: ``_running`` / ``messageListenerAttached`` was already true, the
heartbeat task may have started, ``_current_response_item_id`` /
``currentResponseItemId`` may have carried leaked state from the parked
session, and the ``_ws`` / ``ws`` reference pointed at a now-closed
socket.

Falling through to ``connect()`` on that carcass raced
``session.created`` against stale state, ran two heartbeat timers, and
sometimes attached a second message listener to the new socket — silent
corruption of every adopt-failed call.

Fix: when adopt raises, re-instantiate the adapter (via the existing
``adapter_kwargs`` in Python, ``deps.buildAIAdapter`` in TS) before the
cold ``connect()`` path runs, guaranteeing a clean slate.

Tests: regression test in each SDK constructs an adapter whose
``adopt_websocket`` throws, then asserts (a) a second adapter instance
was created, (b) ``connect()`` ran on the fresh adapter, (c) the
handler's adapter reference points at the fresh instance.

* refactor(stream-handler): duck-type adoptWebSocket capability (drop instanceof)

The TS realtime adopt branch in ``stream-handler.ts:initRealtimeAdapter``
previously gated the prewarm-handoff path with two
``this.adapter instanceof OpenAIRealtimeAdapter`` checks. Switched both
to a single duck-type check (``typeof adoptWebSocket === 'function'``)
so:

1. The generic ``stream-handler`` module stays provider-agnostic on this
   hot path. Pipeline-only users still get the symbol resolved at module
   load (the import is used elsewhere in this file for legitimate
   provider-specific behaviour), but the adopt-handoff gate no longer
   demands a concrete class identity.

2. The check mirrors the Python handler's
   ``getattr(self._adapter, "adopt_websocket", None)`` shape — both
   SDKs now use capability-based detection rather than identity.

3. Future Realtime-like adapters (e.g. a different vendor's all-in-one
   provider that also exposes ``adoptWebSocket``) can opt into the
   adopt flow simply by implementing the method, no SDK change needed.

No behaviour change: the same WS-adopt path runs for the same adapter
class. Existing adopt-handoff tests cover the behaviour and continue
to pass.
@nicolotognoni nicolotognoni deleted the feat/0.6.2-realtime-prewarm branch May 17, 2026 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant