Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ EVA_MODEL__TTS_PARAMS='{"api_key": "your_cartesia_api_key", "model": "sonic"}'
# --- Framework (S2S / AudioLLM) ---
#i Base framework for S2S or AudioLLM pipelines.
#d enum
#e pipecat,openai_realtime,gemini_live,elevenlabs,grok_voice
#e pipecat,openai_realtime,gemini_live,elevenlabs,grok_voice,deepgram
#v EVA_FRAMEWORK=openai_realtime

# ==============================================
Expand Down
28 changes: 28 additions & 0 deletions docs/assistant_server_contract.md
Original file line number Diff line number Diff line change
Expand Up @@ -535,3 +535,31 @@ the run to fail or produce `None` latency fields in the result.
| `audio_assistant.wav` | Yes | TTS quality metrics |
| `framework_logs.jsonl` | Yes | Turn boundary metrics |
| `pipecat_metrics.jsonl` | Yes | `model_response_latency` in `ConversationResult` |

---

## 13. Reference implementation: Deepgram Voice Agent

`src/eva/assistant/deepgram_server.py` (`framework: deepgram`) bridges to Deepgram's
**Voice Agent API** (unified STT→LLM→TTS over one WebSocket) via the `deepgram-sdk`
`client.agent.v1.connect()` interface. It is the closest analogue to the Gemini Live
server and a good template for a new S2S framework.

Notable points specific to Deepgram:

- **Config.** `framework: deepgram`, `model: {s2s: deepgram, s2s_params: {...}}`. Recognised
`s2s_params`: `api_key` (required), `think_provider` (default `open_ai`),
`think_model` / `model` (LLM + metrics label, default `gpt-4o-mini`),
`listen_model` (STT, default `nova-3`), `speak_model` (TTS, default `aura-2-thalia-en`),
`language` (default `en`).
- **Settings.** Sent once on connect via `send_settings(AgentV1Settings)`. Built from a plain
dict and validated with `AgentV1Settings.model_validate(...)`, which resolves the
discriminated provider unions. Audio is `linear16` @ 24 kHz both directions with output
`container: "none"` (raw PCM); `agent.greeting` carries `INITIAL_MESSAGE`.
- **Tools.** Configured under `agent.think.functions` (no `endpoint` ⇒ *client-side*), so the
agent emits `FunctionCallRequest` events; reply with `send_function_call_response`.
- **Events.** `async for message in connection` yields raw `bytes` (TTS audio) or typed events
(`ConversationText`, `UserStartedSpeaking`, `AgentStartedSpeaking`, `AgentAudioDone`,
`FunctionCallRequest`, `Error`, `Warning`).
- **Limitation.** The Voice Agent event stream exposes no token-usage event, so token usage is
not reported for this framework. Latency is still emitted on the first audio chunk per turn.
2 changes: 1 addition & 1 deletion src/eva/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

# Bump simulation_version when changes affect benchmark outputs (agent code,
# user simulator, orchestrator, simulation prompts, agent configs, tool mocks).
simulation_version = "2.0.1"
simulation_version = "2.0.2"

# Bump metrics_version when changes affect metric computation (metrics code,
# judge prompts, pricing tables, postprocessor).
Expand Down
Loading
Loading