chore(release): 0.6.1#102
Merged
Merged
Conversation
…6.1 release Ports the observability work from the now-closed PR #82 onto the post-refactor `libraries/python/` layout. PR #82 was authored against the legacy `sdk-py/` paths and was consolidated into the 0.6.0 release branch; this commit lands the actual implementation against the new layout for 0.6.1. What it adds: - `getpatter.observability.attributes` — three new helpers: `record_patter_attrs(attrs)`, `patter_call_scope(call_id, side)` context manager, `attach_span_exporter(patter, exporter, side)`. Lazy-OTel-guarded; no-op when the `[tracing]` extra is not installed. Two ContextVars (`patter.call_id`, `patter.side`) propagate through the asyncio task tree so spans emitted by deeply nested provider code inherit the active call's identity automatically. - `Patter._attach_span_exporter(exporter, *, side="uut")` — public-but- underscore hook for tools that observe Patter from outside (e.g. an out-of-process agent runner). - Per-provider cost emission across 19 surfaces: `patter.cost.{ telephony_minutes, stt_seconds, tts_chars, llm_input_tokens, llm_output_tokens, realtime_minutes}` stamped on the active span. Provider tag emitted alongside as `patter.{telephony,stt,tts,llm, realtime}.provider`. All call sites wrapped in defensive try/except so observability cannot kill a live call. - Per-turn latency: `patter.latency.{ttfb_ms, turn_ms}` stamped from `StreamHandler._emit_turn_metrics` via a new `PipelineHookExecutor.record_turn_latency(*, ttfb_ms, turn_ms)`. - Bridge-level `patter_call_scope` entry on Twilio + Telnyx — entire WebSocket bridge lifetime (incl. hangup/cleanup) bound to the call identity via `contextlib.ExitStack`. - `TwilioAdapter.record_call_end_cost` / `TelnyxAdapter.record_call_end_cost` — adapter helpers used by the bridge to emit `patter.cost.telephony_minutes` once wall-clock duration is known. Versions bumped 0.6.0 → 0.6.1 in `__init__.py`, `pyproject.toml`, `package.json`. CHANGELOG entry added under a new `## 0.6.1 (2026-05-09)` block; the existing `## 0.6.0 (2026-05-08)` block is preserved verbatim — it reflects exactly what was published to PyPI and npm at that tag.⚠️ TS parity gap: Python only. TypeScript follow-up tracked separately. This is a known time-boxed exception per `.claude/rules/sdk-parity.md`. 5 new unit tests in `libraries/python/tests/unit/ test_observability_attributes_unit.py` exercise the helper module's public surface (`patter_call_scope`, `record_patter_attrs` no-op, `attach_span_exporter` side stamping). Full Python suite: 1719 passed, 7 skipped — green. Refs: closed PR #82. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…in gate from first audio Two bugs caught during 0.6.0 acceptance against `releases/0.6.0/typescript/matrix/outbound-cartesia-cerebras-elevenlabs.ts`: 1. **Dashboard hydrate schema mismatch**: `CallLogger.log_call_end` writes `cost`/`latency`/`duration_ms`/`telephony_provider` as top-level keys of `metadata.json`, but `MetricsStore.hydrate` looked for them under `meta.metrics.cost`/`meta.metrics.latency`. Every hydrated row landed with `metrics=null`, so cost/latency rendered as `$0.00`/`—` for all on-disk calls (only the in-flight call had real numbers). Fix synthesizes a `metrics` dict from the top-level fields when `meta.metrics` is absent while preserving any explicit `meta.metrics` payload untouched. 2. **Early barge-in self-cancellation**: cloud TTS first-byte latency is 200–700 ms; the 250 ms anti-flicker gate (no-AEC PSTN default) was anchored on `_speaking_started_at`/`speakingStartedAt` and expired BEFORE TTS produced audio. VAD then picked up background noise and self-cancelled the agent's first turn — 0 bytes emitted, line silent. Fix anchors the gate on a new `_first_audio_sent_at`/`firstAudioSentAt` set AFTER `bridge.sendAudio` / `audio_sender.send_audio` succeeds at the four pipeline emit sites (firstMessage, streaming, regular, WebSocket remote). `_can_barge_in`/`canBargeIn` returns false while the marker is null. Gate values (250 ms / 1000 ms) unchanged — only the anchor moves. Tests: - Py 1717/1717, TS 1394/1394 green; lint clean. - New regressions: `test_hydrate_lifts_top_level_cost_and_latency_into_metrics`, `test_hydrate_preserves_explicit_metrics_when_present`, `test_barge_in_suppressed_before_first_audio_emitted` (Py) + parity TS cases in `tests/dashboard-store.test.ts` and `tests/unit/stream-handler.test.ts`. - Existing `_handle_barge_in`/`handleBargeIn` tests updated to set both timestamps for the new contract.
Cloud TTS first-byte latency (200-700 ms) plus PSTN background noise
mean the legacy "any VAD speech_start cancels the agent" contract
produced frequent false-positive cancels — cough, click, HVAC, breath,
or a quick "okay" cut the agent mid-sentence and lost the
conversational thread.
This PR adds an opt-in two-stage confirmation pipeline. With the new
empty-tuple default behaviour is unchanged. Configure
``Agent.barge_in_strategies`` / ``agent.bargeInStrategies`` to enable:
1. VAD speech_start during TTS marks the barge-in PENDING. TTS keeps
streaming naturally — the LLM stream stays alive.
2. Each STT transcript is evaluated by every configured strategy
(short-circuit OR; per-strategy errors are isolated).
3. First strategy that returns True confirms the cancel: runs the
existing send_clear + flush ring + LLM abort sequence.
4. If no strategy confirms within ``barge_in_confirm_ms``
(default 1500 ms) the pending state is dropped and the agent
finishes its sentence.
New module ``getpatter.services.barge_in_strategies`` exposes:
- ``BargeInStrategy`` Protocol (async ``evaluate`` + optional ``reset``)
- ``MinWordsStrategy`` — filters short backchannels by requiring N
words while the agent is speaking and letting any single word
through while the agent is silent (so the first user turn is
never delayed).
- ``evaluate_strategies`` / ``reset_strategies`` helpers.
TS parity in ``src/services/barge-in-strategies.ts`` with the same
public surface (``MinWordsStrategy``, ``BargeInStrategy`` interface,
``evaluateStrategies``/``resetStrategies``).
Wiring lives in stream_handler.py ``_handle_barge_in`` and
stream-handler.ts ``handleBargeIn`` — both keep the existing
canBargeIn gate (firstAudioSentAt anchor) and only add the strategy
check when at least one strategy is configured.
Tests:
- Py: 1741/1741 green; new ``test_barge_in_strategies.py`` (14) +
``test_barge_in_two_stage.py`` (10).
- TS: 1419/1419 green; new ``barge-in-strategies.test.ts`` (15) +
``barge-in-two-stage.test.ts`` (10). Lint clean.
- Existing barge-in regression suites still pass byte-for-byte:
empty strategies preserve legacy behaviour exactly.
CHANGELOG ``## Unreleased`` updated with full design + file list.
…n strategies Bundle three changes from branch fix/dashboard-hydrate-schema-and-bargein-grace into the 0.6.1 release: 1. Dashboard MetricsStore.hydrate now lifts top-level cost/latency from CallLogger metadata.json into the synthesized metrics dict — hydrated calls in the dashboard show real \$/p95 instead of \$0.00 / "—". 2. Barge-in gate anchored on firstAudioSentAt (not beginSpeaking) so ElevenLabs/Cartesia first-byte latency no longer lets background noise cancel the agent before any audio reaches the wire. 3. New opt-in barge-in confirmation pipeline with MinWordsStrategy reference implementation. Empty-tuple default preserves legacy cancel-on-VAD behaviour. # Conflicts: # CHANGELOG.md
…ixes Three user-visible features plus a hardening sweep from a 5-agent code review covering security, billing safety, race conditions, and resource leaks. ## Features ### Dashboard cost panel: STT and TTS as separate rows The cost breakdown previously combined STT and TTS into one "STT / TTS" line, hiding which side dominated cost. Now rendered as two adjacent rows labelled with the actual provider name (e.g. "Cartesia STT" / "ElevenLabs TTS"), driven by ``record.metrics.stt_provider`` / ``tts_provider`` already exposed by the backend. Files: ``dashboard-app/src/components/CostPanel.tsx``, ``dashboard-app/src/lib/mappers.ts``. ### stt_ms is now finalization-only (BREAKING semantic change) Previously ``LatencyBreakdown.stt_ms`` measured ``stt_complete - turn_start`` — which conflated user speech duration with STT processing. A 5 s utterance produced ``stt_ms ≈ 5000`` even when Cartesia/Deepgram finalized in 200 ms after end-of-speech. Industry benchmarks (Picovoice/Deepgram/Gladia/Speechmatics) all report STT latency as the finalization window: ``final_transcript - end_of_speech``. ``stt_ms`` now matches that definition. New optional field ``user_speech_duration_ms`` carries the displaced "how long did the user speak" number. Files: ``libraries/python/getpatter/models.py``, ``libraries/python/getpatter/services/metrics.py``, ``libraries/typescript/src/metrics.ts``. ### Pre-warm services + pre-synth firstMessage ``Agent.prewarm: bool = True`` (default on) warms STT/TTS/LLM provider connections in parallel with carrier ``initiate_call`` so DNS, TLS, HTTP/2 / WebSocket handshakes are complete by the time the callee answers. Concrete ``warmup()`` overrides shipped on Deepgram / Cartesia / AssemblyAI STT, ElevenLabs WS / Cartesia / Inworld TTS, OpenAI Realtime. ``Agent.prewarm_first_message: bool = False`` (opt-in) pre-renders ``first_message`` to TTS bytes during ringing and streams the cached buffer instantly when the carrier emits ``start`` — eliminates 200-700 ms of TTS first-byte latency on the greeting at the cost of paying TTS even when the call isn't answered (logged at WARN level when wasted). ## Review fixes (12 issues from 5-agent multi-perspective review) ### Provider warmup correctness - 🔴 OpenAI Realtime warmup uses ``session.update`` (not the non-spec ``response.create`` with ``generate:false`` which could silently bill tokens or return ``invalid_request_error``). Files: ``providers/openai_realtime.py``, ``providers/openai-realtime.ts``. - 🟡 ElevenLabs WS warmup BOS frame now mirrors the live ``synthesize`` BOS byte-for-byte (``voice_settings`` + ``generation_config``). Shared helper ``_build_bos_frame`` / ``buildBosFrame``. Verified billing-safe via no ``flush:true``, no real text. Files: ``providers/elevenlabs_ws_tts.py``, ``providers/elevenlabs-ws-tts.ts``. - 🟡 Inworld TTS warmup uses ``GET /tts/v1/voices`` instead of ``HEAD`` against POST-only stream endpoint (was returning 405 in audit logs). - 🟡 Cartesia STT + AssemblyAI STT warmup error logs no longer leak the API key — catches ``WSServerHandshakeError`` specifically and logs only the HTTP status code, never ``str(exc)`` (which embeds the URL). ### StreamHandler / barge-in correctness - 🟠 Double ``record_overlap_start`` on strategy-confirmed barge-in fixed: VAD start path now stamps T1, the strategy-confirm path no longer overwrites with T2 — ``detection_delay_ms`` is now correct for every user opting into ``barge_in_strategies``. Files: ``stream_handler.py:_do_cancel_for_barge_in``, ``stream-handler.ts:runBargeInCancel``. - 🟠 Pending barge-in task leak fixed: ``cleanup`` (Py) / ``handleStop`` + ``handleWsClose`` (TS) now call ``_clear_pending_barge_in`` so a call ending mid-pending no longer leaves an asyncio.Task / setTimeout firing on a finalized handler. - 🟢 Pre-warm bytes now chunked (1280 B / 40 ms) before ``audio_sender.send_audio`` so barge-in mid-greeting can flush cleanly via the existing mark/clear bookkeeping. ### Patter client + cache hardening - 🟠 Cache eviction on abnormal hangup: the Twilio status callback (``no-answer`` / ``busy`` / ``failed`` / ``canceled``) and the Telnyx ``call.hangup`` / AMD-machine paths now call ``_record_prewarm_waste`` so memory doesn't leak proportional to no-answer rate. - 🟠 Race start-vs-prewarm fixed: a ``_prewarm_consumed`` set tracks consumed call_ids so a late-arriving prewarm task drops its bytes instead of orphaning them in the cache. - 🟡 ``disconnect()`` now cancels in-flight prewarm tasks and clears the cache (no spend leak across serve/disconnect cycles). - 🟡 ``prewarm_first_message=True`` on Realtime / ConvAI mode now logs a WARN and skips the spawn (was silently paying TTS for bytes the StreamHandler never consumed). - 🟡 Prewarm cache bounded at 200 entries with TTL-based eviction (``ring_timeout + 5 s``) — caps memory under outbound flood scenarios. ### Documentation - Docstring for ``Agent.barge_in_strategies`` corrected: TTS continues streaming naturally during pending state (was misleadingly described as "paused"). ## Tests 47 new regression tests across 4 new files plus updates to existing suites. Verifies every fix above with authentic mocks at the network boundary only: - ``libraries/python/tests/test_prewarm.py`` (new — 28 tests covering default flag values, no-op default ``warmup``, all-three-providers warmup invocation, opt-out, exception swallow, cache populate / skip / empty-message / timeout, one-shot pop, waste-warn log, StreamHandler cache-hit short-circuit + cache-miss live-TTS fallback, race orphan, disconnect cleanup, cap+TTL eviction, provider-mode validation, chunking). - ``libraries/python/tests/unit/test_provider_warmup.py`` (new — 18 tests covering all 7 concrete ``warmup()`` overrides + billing-safety regressions + key-leak regressions). - ``libraries/typescript/tests/unit/prewarm.test.ts`` (new — 23 TS twins). - ``libraries/typescript/tests/unit/provider-warmup.mocked.test.ts`` (new — 19 TS twins). - Updates to ``test_barge_in_two_stage.py`` (3 ``record_overlap_start`` tests + 2 cleanup tests), ``barge-in-two-stage.test.ts`` (4 TS twins), ``server-routes.test.ts`` (2 status-callback eviction tests). ## Verification - Python: 1797 passed, 7 skipped, 0 failed (was 1707 + 14 prewarm + 76 inherited from new subclass collection-tests) - TypeScript: 1467 passed across 83 files (was 1430 + 37 new) - TypeScript ``tsc --noEmit`` (lint): clean - TypeScript ``tsup build`` (ESM + CJS + dts + CLI): clean ## CHANGELOG All entries under ``## 0.6.1 (2026-05-09)`` with file paths, line numbers, rationale, and test paths.
…atency Live PSTN smoke tests against ``outbound-cartesia-cerebras-elevenlabs.ts`` exposed several issues in 0.6.1 that were not caught by the unit suite. This commit ships seven fixes plus three quick wins on top of the prewarm pipeline. ## Architectural — WebSocket handoff for prewarm (replaces open-then-close) The 0.6.1 prewarm pipeline as previously shipped (commit ``c585f6d``) opened a streaming-STT and streaming-TTS WebSocket during the carrier ringing window, idled ~250 ms, and closed it. Investigation showed the strategy is structurally insufficient on Node: the ``ws`` package does not thread a TLS session ticket across separate ``new WebSocket(...)`` constructions, so every fresh ``connect()`` at call pickup pays full TCP+TLS+HTTP-101 upgrade. Net saved time was 50–250 ms (DNS cache only) versus 700–1500 ms of cold-start budget. Live test reported "several seconds" first-turn latency, p95 3048 ms. The new strategy keeps the warmed WS open and hands it off to the ``StreamHandler`` at call pickup. New API surface: - ``Patter._prewarmedConnections: Map<callId, ParkedProviderConnections>`` (TS) / ``self._prewarmed_connections: dict[str, ParkedProviderConnections]`` (Py) — keyed by carrier-issued ``call_id``, populated during ringing, drained on call end or after a 30 s safety TTL. - ``provider.openParkedConnection()`` / ``open_parked_connection()`` — added to ``CartesiaSTT``, ``ElevenLabsWebSocketTTS``, ``OpenAIRealtimeAdapter``. Opens the WS, sends the same initial config the live ``connect()`` sends (STT: empty config; TTS: BOS frame matching ``synthesize`` BOS byte-for-byte; Realtime: ``session.update``), and returns a handle the caller parks. - ``provider.adoptWebSocket(handle)`` / ``adopt_websocket(handle)`` — added to the same three providers. Accepts a pre-opened WS, validates ``readyState === OPEN``, and proceeds with the live message loop. For ElevenLabs WS TTS the handle carries a ``bosAlreadySent: true`` flag so the first ``synthesizeStream`` iteration does not double-send BOS (which would be a protocol error). - ``StreamHandler`` checks ``client.popPrewarmedConnections(callId)`` before falling back to fresh ``connect()``. On adopt, the path skips TCP+TLS+upgrade and the BOS round-trip — STT connects in 0 ms, TTS in 0 ms. Cleanup wiring: the same status callback paths that already drain the prewarm-audio cache (FIX #91) now also close any parked WS for failed calls (no-answer / busy / failed / canceled / AMD-machine). The 30 s TTL covers the rare carrier path that emits neither ``start`` nor a status callback. Live validation against ``outbound-cartesia-cerebras-elevenlabs.ts``: ``[PREWARM] callId=… provider=stt ms=769`` followed by ``[CONNECT] callId=… provider=stt source=adopted ms=0`` — STT connect went from 150–400 ms to 0 ms. First-turn greeting wire-time dropped from "several seconds" to **990 ms**. Files: ``libraries/typescript/src/client.ts`` (cache + ``parkProviderConnections``, ``popPrewarmedConnections``, ``closePrewarmedConnections``, ``ParkedProviderConnections`` interface, ``closeParkedConnections`` helper); ``libraries/typescript/src/server.ts`` (forwards ``popPrewarmedConnections`` into ``StreamHandlerDeps``); ``libraries/typescript/src/stream-handler.ts`` (adopt-or-connect logic); ``libraries/typescript/src/providers/{cartesia-stt,elevenlabs-ws-tts,openai-realtime}.ts`` (park + adopt API surface). Python parity in ``libraries/python/getpatter/{client,server,stream_handler,telephony/twilio,telephony/telnyx}.py`` and ``libraries/python/getpatter/providers/{cartesia_stt,elevenlabs_ws_tts,openai_realtime}.py``. Realtime mode has the API surface but the ``OpenAIRealtimeStreamHandler`` adoption is deferred to a follow-up — pipeline mode dominates the affected use case. ## Quick wins (parallel to WS handoff, smaller individual savings) - **Eager AEC import on ``Patter.serve()``** (gated on ``agent.echo_cancellation=true``). Was previously a lazy ``await import('./audio/aec')`` on first ``start`` event, paying 150–400 ms JIT on the first call. Files: ``libraries/typescript/src/client.ts``, ``libraries/python/getpatter/client.py``. - **Parallel ``stt.connect()`` and TTS-firstMessage kickoff**. Previously the StreamHandler awaited STT before TTS firstMessage — STT does not need to be ready to send firstMessage out, only to receive caller audio. Now both kick off concurrently. Saves 200–400 ms on the first turn. Files: ``libraries/typescript/src/stream-handler.ts``, ``libraries/python/getpatter/stream_handler.py``. - **Timing instrumentation**: new ``[PREWARM]`` and ``[CONNECT]`` INFO logs in the prewarm spawn and provider connect paths, with elapsed-ms per provider. Lets us A/B-test future prewarm changes with numerical evidence rather than perceptual reports. ## Dashboard fixes (third pass — issues found during the round-2 PSTN test) ### Live transcript shows only one turn at a time (BUG #102) ``MetricsStore.recordTurn`` correctly accumulated turns into ``active.turns[]`` but the frontend ``toUiTranscript`` mapper had two paths: a primary keyed on ``record.transcript.length > 0`` (used for completed calls) and a fallback that derived rows from ``record.turns``. For an in-flight call the primary always returned empty (active records never carried ``transcript[]``) and only the fallback rendered, so the two paths diverged. Each ``recordTurn`` now mirrors the round-trip into a flat ``active.transcript`` array (one user entry + one assistant entry per turn, filtering empty ``user_text`` and the ``[interrupted]`` agent sentinel), so the primary path sees the same accumulating ``user → assistant → user → assistant → …`` history live calls and completed calls both expose. Files: ``libraries/typescript/src/dashboard/store.ts``, ``libraries/typescript/tests/dashboard-store.test.ts`` (5 new authentic tests). ### Transcript disappears after call end (BUG #101) The Twilio status callback for ``CallStatus=completed`` fires a beat before the WS ``stop`` frame, so ``MetricsStore.updateCallStatus`` moved the active record into the completed buffer **without preserving ``turns[]`` or ``transcript[]``**. The subsequent ``recordCallEnd`` overwrote that completed entry, but in the gap any ``useTranscript`` fetch returned a record with no transcript and the live pane went blank. Three-point fix: (a) ``updateCallStatus`` terminal branch now copies ``active.turns`` and ``active.transcript`` into the new completed entry; (b) ``recordCallEnd`` falls back to active/existing transcript when ``data.transcript`` is empty; (c) the ``useTranscript`` hook subscribes to ``call_end`` SSE events (independent of ``isLive``) so the pane refetches the moment ``recordCallEnd`` lands the SDK-authoritative ``history.entries``. Files: ``libraries/typescript/src/dashboard/store.ts``, ``dashboard-app/src/hooks/useTranscript.ts``. ### Sparkline tooltip generic / wrong metric (BUG #104) The metric-tile sparkline tooltip rendered ``"N call(s)"`` plus a per-call sample list regardless of which card it was attached to — the latency and spend cards therefore showed the same headline as the calls card. New ``MetricKind`` prop (``'count' | 'latency' | 'spend'``) threaded through ``Metric`` → ``SparkBar`` → ``SparkTooltip``, with a pure ``bucketHeadline(bucket, kind)`` helper that computes per-card aggregates: ``TOTAL COST $X.XXX`` (sum of per-call cost), ``AVG LATENCY <p95-mean> ms`` (mean of per-call P95), or ``N CALL(S)``. Headline label uppercased, monospace, styled to match the existing time-range header on the same tooltip. Files: ``dashboard-app/src/App.tsx``, ``dashboard-app/src/components/Metric.tsx``, ``dashboard-app/src/styles/dashboard.css``. ### caller / callee never persisted to metadata.json (BUG B from the second pass) Every persisted ``metadata.json`` showed ``"caller": ""``, ``"callee": ""`` for completed calls — only the in-memory ``MetricsStore`` had the right values. The persist layer received empty strings because the ``CallLogger.log_call_end`` data shape was built from agent options rather than the live record. ``server.ts`` ``wrappedStart`` now resolves ``caller``/``callee`` from the active store record before persisting; Python ``record_call_start`` parity fix stops clobbering caller/callee with empty strings on the upgrade-from-initiated path (TS already had the right pattern). ### Call disappears from dashboard after end (BUG C from the second pass) Race-induced duplicate row: Twilio's status callback for ``CallStatus=completed`` fires ~50–200 ms before the WS ``stop`` frame. ``updateCallStatus`` moved the row out of ``activeCalls`` into ``calls[]`` correctly, then the WS ``stop`` drove ``recordCallEnd``, ``activeCalls.get(callId)`` returned undefined, and a duplicate entry was pushed with ``started_at = 0`` and empty caller/callee. The duplicate masked the well-formed earlier row and the 24h window filter excluded it. ``recordCallEnd`` / ``record_call_end`` now searches ``calls[]`` for the existing entry when active is gone and **updates in place**, preserving caller/callee/started_at and merging in the just-collected metrics. ## Tests 47 new regression tests across 6 files (TS + Py parity): - ``libraries/python/tests/test_prewarm_handoff.py`` (new — 6 tests) - ``libraries/typescript/tests/unit/prewarm-handoff.test.ts`` (new — 6 tests) - ``libraries/python/tests/unit/test_dashboard_store_unit.py`` (+4 dedup + active-accessor tests) - ``libraries/python/tests/unit/test_server_unit.py`` (+1 caller/callee persist test) - ``libraries/typescript/tests/dashboard-store.test.ts`` (+7 dedup + transcript accumulate + accessor tests) - ``libraries/typescript/tests/server.test.ts`` (+1 caller/callee persist test using real ``CallLogger``) ## Verification - Python: ``pytest -q`` → 1808 passed, 7 skipped (was 1797 + 11 new) - TypeScript: ``npm test`` → 1481 passed (was 1467 + 14 new) - TypeScript ``tsc --noEmit`` (lint): clean - TypeScript ``tsup build`` (esm + cjs + dts + cli): clean - Dashboard SPA build (``cd dashboard-app && npm run build``): clean (204.93 kB / 63.47 kB gz) - Dashboard sync: both ``libraries/{python,typescript}/.../dashboard/ui.html`` refreshed - Live PSTN smoke test (``outbound-cartesia-cerebras-elevenlabs.ts``): WS handoff log fired, first-turn greeting 990 ms, transcript live and post-end render OK, sparkline tooltip per-card OK
…ffold Headline changes since cbe1886: * Rolled back the 400 ms STT-final → LLM dispatch debounce introduced earlier in 0.6.1 (`_scheduleTurnCommit` / `_runDeferredTurnCommit` in TS, `_schedule_turn_commit` / `_delayed_turn_commit` in Python). The partial-transcript reschedule branch was overwriting the dispatched FINAL text with the latest partial, causing entire user turns to be dropped during slow-LLM windows. Verified on real PSTN (round 10k with gpt-5-nano dropped 3 of 5 user turns). Dispatch is now synchronous on `is_final` again. The original double-talk symptom is re-opened with a better fix path documented internally. * Kept beneficial 0.6.1 work: `beginSpeaking` stamps `firstAudioSentAt = Date.now()` on every turn so the `canBargeIn()` anti-flicker gate runs in parallel with LLM TTFT + TTS TTFB; VAD `speech_start` calls `anchorUserSpeechStart()` and skips on phantom-during-warmup-gate; commit-drop path re-anchors; WARN log when pipeline has no `llm` / `onMessage` handler; char/4 fallback billing for providers that don't emit a usage chunk; `OpenAILLMProvider.providerKey` static; firstMessage TTS char billing; persist full latency breakdown per percentile in metadata.json; dashboard hydrate reads `transcript.jsonl`; ElevenLabs default flipped to WS. * Lowered dashboard percentile threshold 5 → 2 turns so the detail pane no longer shows `—` for p50/p95 on typical 4-7 turn PSTN calls while the list column already shows a real number via avg fallback. * Added Krisp VIVA noise-suppression scaffold for the TypeScript SDK at `libraries/typescript/src/providers/krisp-filter.ts` for cross- SDK parity with the existing Python `KrispVivaFilter`. Throws at construction time because Krisp does not publish an official Node SDK as of 2026-05; users supply SDK + `.kef` model + license. New top-level exports: `KrispVivaFilter`, `KrispVivaFilterOptions`, `KrispSampleRate`, `KrispFrameDuration`, `DeepFilterNetFilter`, `DeepFilterNetOptions`. * CHANGELOG 0.6.1 section revised to reflect the rollback narrative honestly (debounce attempted, rolled back before release) and to document the new entries. * Scrubbed competitor-name references from source files (Pipecat, LiveKit) per project rule `.claude/rules/no-competitor-references.md`; replaced with "industry-standard pattern" wording. Source files affected: `stream-handler.ts`, `stream_handler.py`, `metrics.ts`, `services/metrics.py`, `silero_vad.py`. * Krisp Python wrapper unchanged. Tests: TS lint clean, vitest 1486/1486 pass; Python pytest unit 1252 pass, 5 skip. Validated on real PSTN: post-rollback p95 wait 1844 ms over 4 clean sequential turns (no drops) on cellular hotspot — vs catastrophic 8521 ms with 3 dropped turns pre-rollback.
Keep ElevenLabsTTS backed by HTTP REST (original cbe1886 state). The WS default caused pipeline latency regression and prewarm lifecycle bugs. ElevenLabsWebSocketTTS remains available as opt-in via direct import.
…s (re-base of #89) (#92) * fix(dashboard): preserve existing calls when new call arrives in SSE stream `mergeCallPreserving` in `dashboard-app/src/hooks/useDashboardData.ts` rebuilt the calls array from the server snapshot via `next.map(...)`, so any call present in the previous UI state but missing from the next payload was silently dropped. With back-to-back calls, the SSE `call_start` refresh occasionally landed before the prior call propagated to `/api/dashboard/calls` and the row vanished from the SPA — regression reported as #124. The merge is now a true upsert: rows present in `prev` but absent from `next` are appended, so prior calls stay visible until the server snapshot stabilises. Server-side eviction (ring buffer of 500) bounds long-running sessions. Pure merge helpers extracted to `dashboard-app/src/hooks/mergeCalls.ts` and exercised by `dashboard-app/src/hooks/mergeCalls.test.ts` (added Vitest to the SPA so the helpers can be tested in isolation without a React harness). Refs #124. * fix(barge-in): firstMessage interruptible via per-chunk mark gating The firstMessage TTS chunks were pushed into the carrier WebSocket as fast as the provider yielded them. Twilio's outbound buffer ended up several seconds deep, and a barge-in's sendClear was queued behind the already-enqueued media frames — the agent kept talking on the user's earpiece for up to ~2 s after the user spoke (#128). The firstMessage send path is now a paced loop: * Twilio: every chunk is followed by a unique mark; the loop waits for the oldest unconfirmed mark once FIRST_MESSAGE_MARK_WINDOW (3 chunks ≈ 120 ms) are in flight. ``onMark`` drains the FIFO on echo so the next chunk goes out. ``cancelSpeaking`` (Py: ``_run_barge_in_cancel``) resolves every pending mark waiter so the loop exits on the next tick and ``sendClear`` lands on a near-empty carrier buffer. * Telnyx (no mark concept): the loop falls back to a playout-duration- based sleep so the buffer can't out-run a clear by more than one chunk. Both SDKs stay in parity: TS ``sendPacedFirstMessageBytes`` mirrors Py ``_send_paced_first_message_bytes`` and both ``streamPrewarmBytes`` / ``_stream_prewarm_bytes`` delegate to the new helper. The existing prewarm chunking test was updated to echo marks via the mock bridge so it interoperates with the new pacing. Coverage: * libraries/typescript/tests/unit/stream-handler.test.ts — ``firstMessage mark-gated pacing`` (3 cases: window cap + barge-in, mark echo slides window, Telnyx playout pacing). * libraries/python/tests/unit/test_first_message_pacing.py — 4 cases including FIFO mark resolution. Refs #128. * fix(barge-in): drain pending marks on call cleanup/stop/ws-close The firstMessage paced sender accumulates one mark waiter (asyncio.Future on Python / Promise on TS) per chunk in _pending_marks / pendingMarks while audio is streaming to the carrier. The barge-in cancel path already drained these, but a call that ended without going through cancel — carrier WebSocket drop, hangup mid firstMessage, stop event arriving before the paced sender finished — left every queued future unresolved. The send loop was awaiting them, so the orphan futures leaked until the handler itself was garbage-collected. Fix: PipelineStreamHandler.cleanup (Py) now invokes _drain_pending_marks before tearing down adapters; the TS handleStop and handleWsClose do the equivalent via drainPendingMarks(). Idempotent and safe when the queue is already empty. Added regression coverage: - libraries/python/tests/unit/test_first_message_pacing.py (TestCleanupDrainsPendingMarks) - libraries/typescript/tests/unit/stream-handler.test.ts (cleanup drains pending firstMessage marks — handleStop + handleWsClose) * fix(barge-in): reset firstMessage mark counter per send + on cleanup PipelineStreamHandler._first_message_mark_counter (Py) and StreamHandler.firstMessageMarkCounter (TS) were never reset between turns or calls. With handler re-use, the counter incremented monotonically across turns — a paced send for the second turn issued fm_<previous_count + 1> while the carrier could still be echoing a stale fm_<N> from the previous turn, corrupting FIFO matching in on_mark / onMark. Fix: reset the counter to 0 at the top of _send_paced_first_message_bytes (Py) / sendPacedFirstMessageBytes (TS) so each paced send begins a fresh fm_1, fm_2, ... sequence. Also reset on cleanup (PipelineStreamHandler.cleanup Py, handleStop + handleWsClose TS) as a belt-and-braces against the cross-call boundary. Coverage: - libraries/python/tests/unit/test_first_message_pacing.py (TestFirstMessageMarkCounterReset — per-send reset + cleanup reset) - libraries/typescript/tests/unit/stream-handler.test.ts (firstMessage mark counter resets across sends + on cleanup) * fix(dashboard): cap merged UI calls at 500 + sort by startedAt desc mergeCallPreserving in dashboard-app/src/hooks/mergeCalls.ts preserved prev_only calls indefinitely by appending them after the fresh snapshot block. Two consequences on a long-lived session: 1. The UI array grew unbounded — once the session cycled through more than 500 calls (the server-side MetricsStore ring buffer default), rows the server had already evicted stayed pinned by prev and were re-appended on every refresh. 2. Ordering was non-deterministic — prev_only rows always landed at the bottom regardless of their startedAtMs, so a newer call could end up below an older one if the snapshot ordering shifted. Fix: after the upsert pass, sort the merged list by startedAtMs descending and slice to MAX_UI_CALLS = 500 so the SPA mirrors the server ring buffer. Coverage: dashboard-app/src/hooks/mergeCalls.test.ts adds a 600-prev+1-fresh cap test and an explicit startedAtMs ordering test. * fix(realtime): only update lastConfirmedMark on matched mark (parity with Python) StreamHandler.onMark in libraries/typescript/src/stream-handler.ts unconditionally assigned this.lastConfirmedMark = markName before checking whether the name corresponded to a queued mark. Any echo arriving after the queue was drained, or any mark name emitted by adapters outside the firstMessage queue, would overwrite the handler- level field and contaminate downstream barge-in heuristics gated on lastConfirmedMark. Python stream_handler.py's on_mark never touches a handler-level field at all — the equivalent state lives on TwilioAudioSender.last_confirmed_mark and is updated only by the carrier's own echo handler. The TS path now matches that behaviour defensively: lastConfirmedMark is updated only after the queue lookup confirms a matching entry, mirroring the safer Python semantics. Coverage: libraries/typescript/tests/unit/stream-handler.test.ts (onMark only updates lastConfirmedMark on a matched mark) asserts that an unmatched echo cannot clobber a previously-set value.
The Python ``CallMetricsAccumulator._emit_eou_metrics`` had ``end_of_utterance_delay`` and ``transcription_delay`` swapped relative to the TypeScript ``emitEouMetrics`` AND emitted them in seconds while TS emits milliseconds. Dashboards or exporters reading the same metric across both SDKs saw a 1000x disagreement on top of swapped field semantics. Locked convention (now identical in both SDKs): - end_of_utterance_delay = stt_final - vad_stopped (ms) - transcription_delay = turn_commit - vad_stopped (ms) - on_user_turn_completed_delay (ms, unchanged) Python now clamps negative deltas to 0 (TS already did). The Python ``EOUMetrics`` docstring updated from "seconds" to "milliseconds". Tests pin both behaviours: - libraries/python/tests/test_metrics.py::TestEOUMetricsEmission - libraries/typescript/tests/unit/metrics.test.ts :: CallMetricsAccumulator > emitEouMetrics field semantics Refs: 0.6.1 observability parity audit.
The Python SDK exposed three OTel-related helpers since 0.6.1:
``record_patter_attrs``, ``patter_call_scope``, ``attach_span_exporter``
(in ``getpatter.observability.attributes``). The TypeScript SDK had no
equivalent surface — every provider adapter that called the Python
helpers had no place to call across the parity boundary, violating
``.claude/rules/sdk-parity.md``.
Port the helpers to TypeScript as no-ops by default. When
``PATTER_OTEL_ENABLED`` is unset or ``@opentelemetry/api`` is not
installed, each helper returns immediately, keeping the zero-cost
disabled path that the rest of the observability module already
respects.
Semantic mapping:
- recordPatterAttrs(attrs) <-> record_patter_attrs
- patterCallScope({ callId, side }, fn) <-> patter_call_scope
- attachSpanExporter(patterInstance, exporter) <-> attach_span_exporter
The JS form of patterCallScope takes an async callback because JS lacks
``with``-style context managers; the closure is the scope body. The
module uses a module-level stack instead of a ContextVar, which is
sufficient for the SDK's one-call-per-handler model.
Tests:
- libraries/typescript/tests/unit/observability-attributes.test.ts
(7 smoke cases covering the public surface + scope unwind on throw)
…se of #90) (#91) * chore(cerebras): debug log when usage chunk missing + fallback fires When an upstream LLM stream (Cerebras and similar) does not emit a `usage` chunk despite `stream_options={include_usage:true}`, the char/4 fallback billing path previously emitted WARN on every tool-loop iteration. Multi-tool turns logged 5-10 identical WARN lines for the same call, drowning real warnings. Replace with one-shot INFO at first fallback per LLMLoop instance (provider, model, char counts, est_tokens), then DEBUG for every subsequent iteration with the running `_usage_missing_count` / `_usageMissingCount` total. No billing behaviour change — char/4 estimation still drives `record_llm_usage` / `recordLlmUsage`. Symmetric Python (`logger.info`/`logger.debug`) and TypeScript (`getLogger().info`/`.debug`). * docs(krisp): refresh unavailable message with current SDK status KrispVivaFilter constructor in the TypeScript SDK still throws — no official Krisp Node.js server SDK exists as of 2026-05. Verified via `npm search krisp`: - `@livekit/krisp-noise-filter` (0.4.3, 2026-04) — browser WASM track processor on the local microphone; cannot run server-side. - `@livekit/react-native-krisp-noise-filter` (0.0.3) — mobile native. - `@krisp.ai/kr-local-monitoring` — Krisp's only first-party npm package; "Local Monitoring API", not noise cancellation. Refreshed the thrown message to (a) stamp the verification date, (b) explicitly distinguish "server Node SDK" from the existing browser/RN wrappers, (c) list the LiveKit packages with the reason they don't apply to Patter (server-received PCM/mulaw stream). Python KrispVivaFilter and TS DeepFilterNetFilter remain the only shipped paths. No code behaviour change. * fix(krisp): remove competitor package names from error message Per .claude/rules/no-competitor-references.md the TS Krisp filter error message cannot cite competitor package names — refactored the "Browser/React Native" block to describe the category generically (third-party wrappers, client-side scope) without naming specific packages. Same cleanup applied to the matching CHANGELOG entry. No behavioural change.
…t WebSocket After commit 8507a34 reverted the HTTP→WS flip, the comment still said ElevenLabsTTS "defaults to WebSocket streaming as of 0.6.1". Updated to reflect current reality: ElevenLabsTTS = HTTP REST (pcm_16000), ElevenLabsWebSocketTTS = WS variant, ElevenLabsRestTTS = HTTP alias.
…uration
Twilio mark ACKs can batch-resolve simultaneously — when all 3 pending
marks in FIRST_MESSAGE_MARK_WINDOW resolve at once, waitForMarkWindow
unblocks 3 consecutive loop iterations with no delay, sending a burst
of ~120ms of audio. The carrier jitter buffer drains for a moment then
refills, producing audible crackling on the first message only (regular
turns use synthesizeSentence / synthesize_sentence which send audio
directly without marks and are unaffected).
Remove the `if markPromise === null` / `if mark_fut is None` guard so
the playout sleep (40ms for a 1280-byte chunk) runs unconditionally after
every chunk on all carriers. Mark tracking for barge-in is preserved.
Files: libraries/typescript/src/stream-handler.ts,
libraries/python/getpatter/stream_handler.py.
Update tests to use fake timers (TS: vi.useFakeTimers + advanceTimersByTimeAsync,
Python: asyncio.sleep mock) so the 40ms per-chunk sleep does not make unit
tests slow. Align tts-facade-language.test.ts with the current ElevenLabs
HTTP REST default (commit 8507a34 reverted the WS flip).
…plete flag The previous fix (always sleep 40ms per chunk) eliminated the initial burst needed to pre-fill Twilio's PSTN jitter buffer (250–1500 ms), causing the same crackling symptom it was meant to cure. Root cause: Twilio can batch-resolve all FIRST_MESSAGE_MARK_WINDOW (3) mark ACKs in a single event-loop turn. When the window unblocks 3 consecutive iterations with no sleep, 3 chunks are sent in burst and the jitter buffer drains momentarily → crackling. Correct fix: the first FIRST_MESSAGE_MARK_WINDOW chunks go out in burst (no playout sleep) to pre-fill the jitter buffer. Once the window is first full, a sticky `initialFillComplete` / `initial_fill_complete` flag flips to true and subsequent chunks are paced by playout time (~40 ms per chunk), preventing batch-ACK bursts. On Telnyx (no mark concept) the playout sleep runs unconditionally on every chunk. Tests: 7/7 Python, 35/35 TypeScript — no changes needed on Python side (autouse sleep-patch fixture already makes all sleeps instant).
Three related fixes to the no-AEC Twilio PSTN pipeline that together
deliver a smooth-feeling agent on real phone calls (verified live: 5
turns, p95 wait 685 ms, every user utterance produced a fresh VAD
speech_start with multiple successful interruptions in one call).
1. One-shot barge-in. After a successful barge-in, subsequent barge-in
attempts silently failed. PSTN echo of the agent's TTS kept
SileroVAD's smoothed probability above deactivationThreshold (0.35)
for the full agent turn, so pubSpeaking stayed true cross-turn and
no fresh SILENCE -> SPEECH transition ever fired. Added an optional
reset() hook to VADProvider; SileroVAD implements it by clearing
the pending buffer, pubSpeaking, the speech/silence threshold
durations, the ExpFilter, AND the ONNX RNN hidden state + rolling
context (without resetting the model the detector "remembers" the
echo). StreamHandler invokes reset in beginSpeaking (every new
agent turn starts clean) and at the grace-timer fire of
endSpeakingWithGrace (natural turn end leaves VAD ready for the
next spontaneous user utterance).
2. First-message crackle. StatefulResampler seeded its 5-tap FIR
history with input[0] on the first call. When ElevenLabs HTTP
streaming delivers a chunk that starts at non-zero amplitude this
produced a startup transient audible as a brief crackle at the
beginning of the first TTS message. Seeded with zeros instead —
the correct initial condition for a filter that has received no
prior input.
3. Barge-in gate 250 ms -> 100 ms, suppressed speech flushed. The
no-AEC anti-flicker gate was 250 ms, which on short agent turns
(< ~400 ms of audio) consumed most of the turn and silently
suppressed legitimate interruptions. Reduced to 100 ms (still
blocks the ~100-200 ms PSTN echo round-trip). When a speech_start
is gate-suppressed the inboundAudioRing accumulates user audio
that was previously discarded at the next beginSpeaking; added a
suppressedSpeechPending flag so the grace-timer flush replays the
ring to STT on natural turn end.
Parity: TS unconditionally stamps firstAudioSentAt in beginSpeaking
since 2026-05-11; Python _begin_speaking now matches (was conditional
on is_first_message, which made any turn with a slow LLM
un-interruptible for the full LLM TTFT window).
Files touched:
libraries/typescript/src/types.ts (VADProvider.reset?)
libraries/typescript/src/providers/silero-vad.ts (reset impl)
libraries/typescript/src/stream-handler.ts (call resetVad,
suppressedSpeechPending, gate constant, ring flush on grace)
libraries/typescript/src/audio/transcoding.ts (FIR zero-seed)
libraries/python/getpatter/providers/base.py (VADProvider.reset)
libraries/python/getpatter/providers/silero_onnx.py (OnnxModel.reset)
libraries/python/getpatter/providers/silero_vad.py (reset impl)
libraries/python/getpatter/stream_handler.py (parity wiring)
CHANGELOG.md (Unreleased entries)
tests: silero-vad reset coverage (TS + Py), updated stream-handler
+ transcoding tests for new state-machine wiring.
Validation: TS 945/945 unit tests + lint + build green;
Python 1259+13 unit tests green; live PSTN call confirmed smooth
multi-barge-in.
…de polish
Three coordinated dashboard improvements landed together because they
share the same SPA bundle + cross-SDK route/store parity surface.
1. Soft-delete selected calls from the dashboard view + aggregates.
On-disk artefacts (metadata.json, transcript.jsonl) are preserved
as the durable backup the operator can audit outside the dashboard.
- MetricsStore.deleteCalls / delete_calls accept ids, ignore active
calls (safety), persist the set atomically to
<log_root>/.deleted_call_ids.json so deletions survive restart.
- getCalls / getCall / getAggregates / getCallsInRange / callCount /
hydrate now filter against the deleted set so rolling metrics
(avg latency, total spend) recompute against the visible window
immediately on delete.
- New endpoints, parity TS↔Python:
* DELETE /api/dashboard/calls/:call_id
* POST /api/dashboard/calls/delete { call_ids: [...] }
- SSE event ``calls_deleted`` so other tabs / external clients
re-render in real time.
- SPA: per-row checkbox column (live rows disabled), bulk-action
bar that reveals on selection > 0 with inline confirmation step
("Removes from view + metrics. Logs kept on disk.") gated by a
peach destructive button.
2. Top-bar toggles: PII reveal (eye / eye-slash) + theme (sun / moon),
both persisted in localStorage so the operator's last choice
survives a reload. Default state is hidden + light — screen-share
safe out of the box.
- New ``useUiPrefs`` hook centralises both prefs and applies the
``body.dark`` class side-effect so the existing dark-mode CSS
overrides flip in lockstep.
- fmtPhone(p, revealed) renders ``•••<last4>`` using U+2022 BULLET
instead of asterisks so the mask sits on the digit baseline.
PII cells gain ``font-variant-numeric: tabular-nums`` so toggling
reveal doesn't jitter the column width.
- Reveal currently honours whatever the server provided —
``PATTER_LOG_REDACT_PHONE`` still controls the on-disk format,
unchanged. Operators who want full numbers in the dashboard can
set ``PATTER_LOG_REDACT_PHONE=full`` to log new calls in full;
historical data stays masked by construction.
3. Dark-mode polish + min-height layout.
- Page palette lifted: bg #0d0d0d → #121212, cards #171717 →
#1c1c1c, borders #262626 → #2a2a2a. Previous pitch-black felt
oppressive against the brand's cream/peach accent.
- Active toggles use the peach accent instead of stark white
(.seg button.on, .icon-btn.toggle.on) — the white blocks felt
like a light-mode leftover floating on the dark page.
- Fixed invisible / unreadable elements in dark mode:
* Patter logo (was inline ``color: var(--ink)``, now inherits)
* Transcript turn body text (.turn .body .txt was #1a1a1a)
* Metrics waterfall track + STT bar + value (.wf-row .track was
cream, .seg-bar.stt was #1a1a1a, .v was #000)
* Duration block value (.duration-block .v was #000)
* Sparkline empty bars (cascade from broad ``.spark-bar``
override leaked into ``.empty`` — added ``:not(.empty)``)
* kbd ⇧K chip (cream blob on dark)
* .ctrl.active, .pill.queued, .pill.fail, .lat-bar.warn,
.car-dot.tx, .stack-row labels, .latbox.warn variant.
* New-row insertion flash used cream end-state → dedicated
``slideInDark`` keyframe.
- Min-height baseline so the layout doesn't collapse when no
calls match the active range: table scroll area pinned at
540px (was unbounded down), .rr right column 590px,
.rr-card 280px.
Files touched
=============
- dashboard-app/src/components: CallTable, Topbar, PatterLogo,
LiveCallPanel, icons, format
- dashboard-app/src/hooks: useUiPrefs (new), useDashboardData
- dashboard-app/src/lib/api.ts, App.tsx, styles/dashboard.css
- libraries/python/getpatter/dashboard: store.py, routes.py, ui.html
- libraries/python/tests/test_dashboard.py
(TestMetricsStoreDelete — 8 new tests)
- libraries/typescript/src/dashboard: store.ts, routes.ts, ui.html
- libraries/typescript/tests/unit/dashboard-store.test.ts
(deleteCalls — 8 new tests covering hide / aggregates / range /
active-skip / idempotent / SSE / persistence / empty input)
- CHANGELOG.md
Verification
============
- SPA build green (224 KB bundle, gzip 68 KB)
- Python: 1832 tests passing (8 new)
- TypeScript: 952 tests passing (8 new) + lint clean
Toggling MetricsPanel tabs between Latency and Cost caused a vertical
jump because the two layouts had different natural heights — Latency
(pipeline mode) renders 4 latency cards + a 3-row waterfall + legend
(~230 px), while Cost renders the cost bar + 4-6 stack rows (~180 px).
The card outer height shifted by ~50 px on every toggle.
Wrapped both tab views in a .metrics-panel-body container with
min-height: 240 px (sized to the tallest layout). Both tabs now
occupy exactly 321 px outer / 240 px body — switching is a pure
content swap with no layout reflow.
Verified via Chrome DOM audit: latencyHeight=321, costHeight=321,
diff=0.
Files:
dashboard-app/src/components/MetricsPanel.tsx (body wrapper)
dashboard-app/src/styles/dashboard.css (.metrics-panel-body rule)
libraries/{python,typescript}/.../dashboard/ui.html (resynced bundle)
…altime API)
The 0.6.1 enum entry for `gpt-realtime-2` advertised it as drop-in with the
existing v1 Realtime adapter; OpenAI in fact promoted that model to the GA
Realtime API, which rejects the `OpenAI-Beta: realtime=v1` header, requires
a different `session.update` wire shape (`type: "realtime"`,
`output_modalities`, nested `audio.{input,output}` with MIME types), and
renamed the audio-delta event family (`response.audio.*` →
`response.output_audio.*`). Going through the v1 adapter with
`model: "gpt-realtime-2"` either timed out at connect() or produced a
"successful" call with zero audio bytes forwarded to the carrier.
New `OpenAIRealtime2` engine marker (kind: `openai_realtime_2`) + new
`OpenAIRealtime2Adapter` subclassing `OpenAIRealtimeAdapter`. The subclass
overrides only `connect()` (GA payload + no beta header) and
`sendFirstMessage()` (forces `output_modalities` shape, re-injects
`audio.output.voice` since GA `response.create` doesn't inherit it from
session, sets `reasoning.effort: "minimal"` to keep TTFB tight on the
literal "say exactly X" greeting). A WS-level `emit` shim renames the GA
audio-delta event types back to the v1 names so the parent dispatcher and
`StreamHandler` keep working unchanged.
The legacy `OpenAIRealtime` engine and `OpenAIRealtimeAdapter` continue to
serve `gpt-realtime`, `gpt-realtime-mini`, `gpt-4o-realtime-preview`,
`gpt-4o-mini-realtime-preview` against the v1-beta endpoint byte-for-byte
unchanged. Only visibility on a handful of fields/methods was promoted
from `private` to `protected` so the subclass can reuse heartbeat + message
dispatch; no public surface changed.
Verified end-to-end on a Twilio PSTN call: 13.6s / 3 turns / firstMessage
plays in the configured voice, language follows systemPrompt, audio flows
both directions.
Python parity is a follow-up — flagged in CHANGELOG; the daily
docs-feature-drift cron will surface the gap until Python lands.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…unk splitting for Twilio
The 0.6.1 OpenAIRealtime2 engine connected and exchanged events but produced
silent calls over Twilio: the GA endpoint accepts `audio/pcmu` in
`session.update` but the audio engine silently drops mulaw frames
(`input_audio_buffer.commit` reports "0.00 ms of audio") and always emits
PCM-24 regardless of the declared output format. Until OpenAI ships native
g711 on the GA endpoint we transcode on both directions inside the subclass.
Inbound (Twilio → model): override `sendAudio` to decode mulaw, apply 2x gain
to lift telephony peaks into the GA VAD's expected band, then 3x linear
upsample to PCM-24 with a one-sample carry across chunk boundaries.
`session.audio.input.format` switched to `{ type: "audio/pcm", rate: 24000 }`.
Outbound (model → Twilio): wrap the audio-delta translation to resample
PCM-24 → PCM-8 via 24→16→8 chain (second step carries the 5-tap FIR
anti-alias filter that the direct 24→8 path lacks), encode to mulaw 8 kHz,
and split into 20 ms (160 B) slices emitted as separate audio events.
Twilio's media pipeline stalls when fed deltas of the GA's natural ~200-400 ms
granularity; 20 ms frames restore the expected playout cadence.
VAD tuning: lowered `server_vad` threshold to 0.1 (default 0.5) and raised
`silence_duration_ms` to 500 so 3x-upsampled telephony-band audio reliably
triggers `speech_started`.
Visibility bumps on `OpenAIRealtimeAdapter`: `ws`, `armHeartbeatAndListener`,
`options` promoted from `private` to `protected` so the subclass can install
the wire-level translation shim and reuse the parent's message dispatch
unchanged. No public surface changed; v1 adapter behaviour byte-for-byte
identical.
Known limitation: model output now plays audibly on the caller side, but
GA `server_vad` is still tuned for studio audio so the user-speech path
remains less reliable than pipeline mode. Pipeline mode (STT+LLM+TTS) is
the recommended production path for Twilio in 0.6.1 until OpenAI ships
native g711_ulaw GA.
Python parity is still a follow-up.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Roll the Unreleased changelog block into the 0.6.1 (2026-05-15) section. Version literals in sdk-py/__init__.py, sdk-py/pyproject.toml, sdk-ts/package.json were already at 0.6.1 from prior commits — this commit only normalises the changelog ahead of the release PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Preview deployment for your docs. Learn more about Mintlify Previews.
💡 Tip: Enable Workflows to automatically generate PRs for you. |
Removes a trailing blank line so the Pre-commit CI hook is happy on the 0.6.1 release PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Release 0.6.1 of both SDKs (
getpatteron PyPI +getpatteron npm).21 commits since
maincover: new providers (OpenAIRealtime2,InworldTTS, TSSpeechmaticsSTT), pipeline robustness (one-shot barge-in fix, first-message pacing, EOU/metrics alignment), dashboard rewrite (Vite+React SPA, multi-select delete, dark mode polish, MetricsPanel tab parity), observability (OTel spans on Python, no-op stubs on TS), cost/pricing accuracy (Cerebras + Deepgram + ElevenLabs + Realtime model-aware rates), and speech-edge callbacks for turn-taking instrumentation.Highlights
Added
OpenAIRealtime2TS engine for the GAgpt-realtime-2Realtime API. Separate adapter that handles the GA wire-shape divergence (noOpenAI-Betaheader,session.type:"realtime",output_modalities, nestedaudio.{input,output}, renamedresponse.output_audio.*events). Includes bidirectional audio transcoding (mulaw 8 kHz ↔ PCM-24) so the engine works over Twilio today; outbound deltas are split into 20 ms frames so the carrier playout scheduler accepts them at native cadence. Python parity is follow-up.InworldTTS(both SDKs) — newinworld-tts-2and TTS-1.5 family, ~150-200 ms TTFT, 100+ languages, EXPRESSIVE/BALANCED/STABLE delivery modes, envINWORLD_API_KEY.SpeechmaticsSTT— closes Python-only gap. Native RT v2 WebSocket protocol, full options parity (turn detection mode, max delay, diarization, additional vocab, operating point).gpt-realtime-2+gpt-realtime-whispermodel enum entries (v1 + GA path),reasoningEffortknob on the high-level Realtime engine wrapper.on_user_speech_started/_ended/_eos,on_agent_speech_started/_ended,on_llm_token,on_audio_out) withconversation_statesnapshot + per-side state machine, parity Python+TS. OpenTelemetry span events whenPATTER_OTEL_ENABLED=1.<log_root>/.deleted_call_ids.json) and livecalls_deletedSSE.agent({ mcpServers: [...] })), tool retry policy + per-tool circuit breaker, tool JSON-schema validation at build time, OpenAI strict-mode opt-in, async-generator tool handlers (progress + result), reassurance auto-message during long tool calls.Patter(persist=...)option for explicit on-disk dashboard persistence.Changed
dist/index.html(~190 KB JS+CSS inlined).npm run synccopies to both SDKs asui.html. End-user experience unchanged.elevenlabs.TTSfacade defaults to WebSocket streaming (both SDKs). TTFB p50 drops ~265 ms → ~80-100 ms (after first turn pays one handshake). NewElevenLabsRestTTSexposed as opt-out for free/starter tier oreleven_v3.gpt-realtime-2-2026-05-08→gpt-realtime-2).Fixed
SileroVAD.reset()between agent turns. Pubic VAD state stuck after the first barge-in due to TTS echo loopback.firstMessageuninterruptible during TTS warm —firstAudioSentAtstamped synchronously inbeginSpeaking. First-message burst pacing corrected.Agent.first_messagewas injected as user input (Py+TS) — model replied to its own greeting / swapped role. Newsend_first_message/sendFirstMessagewithrole:"assistant".onTranscript+ dashboard transcript out-of-order — pending-assistant buffer added so Whisper-delayed user transcripts re-anchor the order.emitToolEventpushesrole:"tool"history withname(args) → result.speech_startduring agent TTS contaminated turn anchors (Py+TS) — anchor reset onrunBargeInCancel+ pending-barge-in timeout.input[0]).gpt-oss-120b/llama3.1-8b/qwen-3-235b/qwen-3-codercorrected against per-model docs.fmtCostUSDso $0.00012 renders correctly (was rounding to $0.00 on Cerebras 5-turn calls).STT.connectwith TTS firstMessage kickoff.Known limitations
OpenAIRealtime2over Twilio: outbound audio plays cleanly thanks to transcoding + chunking, but GAserver_vadis tuned for studio audio and inbound voice recognition is less reliable than pipeline mode. Pipeline mode (STT+LLM+TTS) is the recommended production path for Twilio in 0.6.1 until OpenAI ships native g711_ulaw on the GA endpoint. Community thread #1380750 tracks the upstream gap.OpenAIRealtime2is a follow-up — TS-only in 0.6.1.Version bumps
sdk-py/getpatter/__init__.py→0.6.1sdk-py/pyproject.toml→0.6.1sdk-ts/package.json→0.6.1Test plan
Release process (post-merge)
git checkout main && git pull --ff-only origin maingit tag -a v0.6.1 -m "Release 0.6.1 — pipeline robustness + new providers + dashboard SPA"git push origin v0.6.1.github/workflows/release.yml→ PyPI (OIDC) + npm (token) auto-publishgh release create v0.6.1with release notes linking this PR🤖 Generated with Claude Code