Skip to content

Add Deepgram Voice Agent framework support#144

Open
weiz9 wants to merge 2 commits into
mainfrom
pr/wz/deepgram-voice-agent-framework
Open

Add Deepgram Voice Agent framework support#144
weiz9 wants to merge 2 commits into
mainfrom
pr/wz/deepgram-voice-agent-framework

Conversation

@weiz9

@weiz9 weiz9 commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a new deepgram assistant-server framework backed by Deepgram's Voice Agent API (unified STT→LLM→TTS over a single WebSocket), so the Deepgram agent can be benchmarked like the existing S2S frameworks. Modeled on the Gemini Live server and the docs/assistant_server_contract.md contract.

Changes

  • New server src/eva/assistant/deepgram_server.py (DeepgramAssistantServer): bridges the Twilio-framed user-simulator WebSocket ↔ client.agent.v1.connect(), with the standard audit-log / framework-log / metrics-log / audio-buffer handling.
  • Registration: deepgram added to worker._get_server_class() and the framework Literal in models/config.py.
  • Tests: tests/unit/assistant/test_deepgram_server.py (settings + tool-conversion) and a dispatch case in test_framework_dispatch.py.
  • Docs: Deepgram reference section in assistant_server_contract.md; deepgram added to the framework enum in .env.example.
  • Version: simulation_version 2.0.0 → 2.1.0 (affects benchmark outputs).

Implementation notes (found via live testing)

  • SDK workaround: deepgram-sdk 6.1.x mis-deserializes every agent event to the same model via its typed iterator, dropping transcripts and tool-call requests. The server iterates the raw WebSocket and dispatches on the JSON type instead (audio still arrives as binary frames).
  • KeepAlive: a periodic KeepAlive task prevents Deepgram's ~10s input-audio timeout from closing the session while the half-duplex agent is speaking.
  • Latency: model_response latency is measured from the server-side receipt time of the simulator's user_speech_stop event.

Testing

  • ruff, mypy (new file clean), and pre-commit run --all-files all pass (version-bump + metric-signature hooks included).
  • 351 unit tests pass across assistant/orchestrator/models suites.
  • Live end-to-end runs across airline, itsm, and medical_hr (5 records each): ~104/105 conversations reached a valid end (conversation_valid_end = 1.0), 500+ tool calls executed with scenario-DB mutations, and zero framework-level errors (no disconnects, audio timeouts, or event-processing failures).

Config example

EVA_FRAMEWORK=deepgram
EVA_MODEL__S2S=deepgram
EVA_MODEL__S2S_PARAMS='{"api_key":"<deepgram-key>","model":"gpt-4o-mini"}'

Optional s2s_params (defaults): think_provider (open_ai), listen_model (nova-3), speak_model (aura-2-thalia-en), language (en).

weiz9 added 2 commits June 9, 2026 19:34
Adds a `deepgram` assistant-server framework backed by Deepgram's Voice
Agent API (unified STT->LLM->TTS over a single WebSocket), so it can be
benchmarked like the existing S2S frameworks.

- New DeepgramAssistantServer (src/eva/assistant/deepgram_server.py),
  modeled on the Gemini Live server and the assistant_server_contract.
- Register `deepgram` in worker._get_server_class and the framework Literal.
- Parse raw WebSocket JSON by event `type` rather than the SDK's typed
  iterator, which in deepgram-sdk 6.1.x mis-deserializes every agent event
  as the same model (dropping transcripts and tool-call requests).
- KeepAlive task to prevent Deepgram's ~10s input-audio timeout from
  closing the session while the (half-duplex) agent is speaking.
- Compute model_response latency from the server-side receipt time of
  user_speech_stop (the simulator emits it on a monotonic clock).
- Unit tests for settings/tool conversion + framework dispatch test.
- Docs section in assistant_server_contract.md; .env.example framework enum.
- Bump simulation_version 2.0.0 -> 2.1.0 (affects benchmark outputs).
…agent-framework

# Conflicts:
#	src/eva/__init__.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant