Conversation
… tune interruption pipeline
- Bump livekit-agents dependency to >=1.5.1 for TurnHandlingOptions and
adaptive interruption support.
- Migrate AgentSession construction from deprecated individual params
to TurnHandlingOptions dict with MultilingualModel turn detection
and adaptive ML-based barge-in classification.
- Fix Sarvam TTS audio playback: disable WebSocket streaming (returns
raw PCM without WAV headers) and force REST API path which returns
proper WAV with RIFF headers. Workaround for livekit/agents#5267.
- Align Sarvam TTS from_config defaults with __init__ defaults
(en-IN/bulbul:v3/shubh/True).
- Tune VAD and interruption defaults to SDK-recommended values:
activation_threshold 0.25->0.5, min_silence_duration 0.25->0.5,
min_interruption_duration 0.05->0.3, endpointing delays reduced
for faster turn commits.
- Make SIP noise cancellation configurable via noise_cancellation_sip
kwarg (default: off) — avoids double-talk suppression from
BVCTelephony layering on top of provider echo cancellation.
- Add preemptive_generation support for reduced perceived latency.
- Gate diagnostic event handlers behind debug=True kwarg.
- Add echo detection helpers (transcription_node, agent text buffer)
to AgentSetup for future echo filtering.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TurnHandlingOptionsAPI andMultilingualModelfor ML-based turn detection, replacingthe deprecated per-parameter session construction
streaming path (raw PCM without WAV headers) by forcing REST
API, which returns proper WAV audio
barge-in classification replaces pure VAD thresholding, with
tuned defaults for reliable short-phrase interruption (e.g.
"okay fine")
disabled by default for SIP participants to prevent double-talk suppression that blocks user interruptions
Changes
SDK Migration (
entrypoint.py)turn_detection="stt"→TurnHandlingOptionswithMultilingualModel()and"mode": "adaptive"allow_interruptions,min_endpointing_delay, etc.) → unifiedturn_handlingdict0.5, silence 0.5s, interruption min 0.3s)
response commits
preemptive_generation=Trueenabled by default for lowerperceived latency
Sarvam TTS Fix (
plugins/sarvam.py)client._capabilities = TTSCapabilities(streaming=False)forces REST API path
returns raw PCM
from_configdefaults aligned with__init__(en-IN,bulbul:v3, shubh, preprocessing on)
SIP Noise Cancellation (
entrypoint.py)noise_cancellation_sipkwarg (defaultFalse) —BVCTelephony off for SIP by default
Diagnostic Logging (
entrypoint.py)debug=Truekwarg gates verbose event handlers(transcripts, state changes, false interruptions)
interrupt debugging
Echo Detection Helpers (
voice_agent.py)transcription_node()override captures real-time LLMoutput in rolling buffer
get_recent_agent_text()/clear_agent_text_buffer()forecho comparison