Skip to content

feat: upgrade to LiveKit Agents SDK v1.5.1, fix Sarvam TTS audio, and tune interruption pipeline#18

Open
elt7613 wants to merge 2 commits intomainfrom
feat/sdk-upgrade-sarvam-tts-fix-adaptive-interruption
Open

feat: upgrade to LiveKit Agents SDK v1.5.1, fix Sarvam TTS audio, and tune interruption pipeline#18
elt7613 wants to merge 2 commits intomainfrom
feat/sdk-upgrade-sarvam-tts-fix-adaptive-interruption

Conversation

@elt7613
Copy link
Copy Markdown
Collaborator

@elt7613 elt7613 commented Mar 29, 2026

Summary

  • SDK v1.5.1 upgrade with TurnHandlingOptions API and
    MultilingualModel for ML-based turn detection, replacing
    the deprecated per-parameter session construction
  • Sarvam TTS audio fix — bypasses broken WebSocket
    streaming path (raw PCM without WAV headers) by forcing REST
    API, which returns proper WAV audio
  • Interruption pipeline overhaul — adaptive ML-based
    barge-in classification replaces pure VAD thresholding, with
    tuned defaults for reliable short-phrase interruption (e.g.
    "okay fine")
  • Configurable SIP noise cancellation — BVCTelephony
    disabled by default for SIP participants to prevent double-talk suppression that blocks user interruptions

Changes

SDK Migration (entrypoint.py)

  • turn_detection="stt"TurnHandlingOptions with
    MultilingualModel() and "mode": "adaptive"
  • Individual params (allow_interruptions,
    min_endpointing_delay, etc.) → unified turn_handling dict
  • VAD defaults aligned to SDK recommendations (activation
    0.5, silence 0.5s, interruption min 0.3s)
  • Endpointing delays tightened (0.2s–0.6s) for faster
    response commits
  • preemptive_generation=True enabled by default for lower
    perceived latency

Sarvam TTS Fix (plugins/sarvam.py)

SIP Noise Cancellation (entrypoint.py)

  • noise_cancellation_sip kwarg (default False) —
    BVCTelephony off for SIP by default
  • SIP providers (Twilio, Vonage) already provide echo cancellation; second layer hurts more than helps
  • Non-SIP participants still get BVC noise cancellation

Diagnostic Logging (entrypoint.py)

  • debug=True kwarg gates verbose event handlers
    (transcripts, state changes, false interruptions)
  • Logs include speech handle state alongside transcripts for
    interrupt debugging

Echo Detection Helpers (voice_agent.py)

  • transcription_node() override captures real-time LLM
    output in rolling buffer
  • get_recent_agent_text() / clear_agent_text_buffer() for
    echo comparison

… tune interruption pipeline

  - Bump livekit-agents dependency to >=1.5.1 for TurnHandlingOptions and
    adaptive interruption support.
  - Migrate AgentSession construction from deprecated individual params
    to TurnHandlingOptions dict with MultilingualModel turn detection
    and adaptive ML-based barge-in classification.
  - Fix Sarvam TTS audio playback: disable WebSocket streaming (returns
    raw PCM without WAV headers) and force REST API path which returns
    proper WAV with RIFF headers. Workaround for livekit/agents#5267.
  - Align Sarvam TTS from_config defaults with __init__ defaults
    (en-IN/bulbul:v3/shubh/True).
  - Tune VAD and interruption defaults to SDK-recommended values:
    activation_threshold 0.25->0.5, min_silence_duration 0.25->0.5,
    min_interruption_duration 0.05->0.3, endpointing delays reduced
    for faster turn commits.
  - Make SIP noise cancellation configurable via noise_cancellation_sip
    kwarg (default: off) — avoids double-talk suppression from
    BVCTelephony layering on top of provider echo cancellation.
  - Add preemptive_generation support for reduced perceived latency.
  - Gate diagnostic event handlers behind debug=True kwarg.
  - Add echo detection helpers (transcription_node, agent text buffer)
    to AgentSetup for future echo filtering.
@elt7613 elt7613 self-assigned this Mar 29, 2026
@elt7613 elt7613 requested a review from blackdwarftech March 29, 2026 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant