ServiceNow · weiz9 · Jun 10, 2026 · Jun 10, 2026
diff --git a/.env.example b/.env.example
@@ -115,7 +115,7 @@ EVA_MODEL__TTS_PARAMS='{"api_key": "your_cartesia_api_key", "model": "sonic"}'
 # --- Framework (S2S / AudioLLM) ---
 #i Base framework for S2S or AudioLLM pipelines.
 #d enum
-#e pipecat,openai_realtime,gemini_live,elevenlabs,grok_voice
+#e pipecat,openai_realtime,gemini_live,elevenlabs,grok_voice,deepgram
 #v EVA_FRAMEWORK=openai_realtime
 
 # ==============================================

diff --git a/docs/assistant_server_contract.md b/docs/assistant_server_contract.md
@@ -535,3 +535,31 @@ the run to fail or produce `None` latency fields in the result.
 | `audio_assistant.wav` | Yes | TTS quality metrics |
 | `framework_logs.jsonl` | Yes | Turn boundary metrics |
 | `pipecat_metrics.jsonl` | Yes | `model_response_latency` in `ConversationResult` |
+
+---
+
+## 13. Reference implementation: Deepgram Voice Agent
+
+`src/eva/assistant/deepgram_server.py` (`framework: deepgram`) bridges to Deepgram's
+**Voice Agent API** (unified STT→LLM→TTS over one WebSocket) via the `deepgram-sdk`
+`client.agent.v1.connect()` interface. It is the closest analogue to the Gemini Live
+server and a good template for a new S2S framework.
+
+Notable points specific to Deepgram:
+
+- **Config.** `framework: deepgram`, `model: {s2s: deepgram, s2s_params: {...}}`. Recognised
+  `s2s_params`: `api_key` (required), `think_provider` (default `open_ai`),
+  `think_model` / `model` (LLM + metrics label, default `gpt-4o-mini`),
+  `listen_model` (STT, default `nova-3`), `speak_model` (TTS, default `aura-2-thalia-en`),
+  `language` (default `en`).
+- **Settings.** Sent once on connect via `send_settings(AgentV1Settings)`. Built from a plain
+  dict and validated with `AgentV1Settings.model_validate(...)`, which resolves the
+  discriminated provider unions. Audio is `linear16` @ 24 kHz both directions with output
+  `container: "none"` (raw PCM); `agent.greeting` carries `INITIAL_MESSAGE`.
+- **Tools.** Configured under `agent.think.functions` (no `endpoint` ⇒ *client-side*), so the
+  agent emits `FunctionCallRequest` events; reply with `send_function_call_response`.
+- **Events.** `async for message in connection` yields raw `bytes` (TTS audio) or typed events
+  (`ConversationText`, `UserStartedSpeaking`, `AgentStartedSpeaking`, `AgentAudioDone`,
+  `FunctionCallRequest`, `Error`, `Warning`).
+- **Limitation.** The Voice Agent event stream exposes no token-usage event, so token usage is
+  not reported for this framework. Latency is still emitted on the first audio chunk per turn.
diff --git a/src/eva/__init__.py b/src/eva/__init__.py
@@ -7,7 +7,7 @@
 
 # Bump simulation_version when changes affect benchmark outputs (agent code,
 # user simulator, orchestrator, simulation prompts, agent configs, tool mocks).
-simulation_version = "2.0.1"
+simulation_version = "2.0.2"
 
 # Bump metrics_version when changes affect metric computation (metrics code,
 # judge prompts, pricing tables, postprocessor).