Architecture Documentation

Streaming Architecture Overview

This backend implements a bidirectional streaming architecture for real-time voice interaction with ElevenLabs agents.

Architecture Diagram

┌─────────────┐
│  Frontend   │
│  (Browser)  │
└──────┬──────┘
       │ WebSocket (bidirectional)
       │ Audio chunks (streaming)
       ▼
┌─────────────────────────────────┐
│      Backend (FastAPI)          │
│  ┌───────────────────────────┐  │
│  │  WebSocket Handler        │  │
│  │  - Receives audio chunks  │  │
│  │  - Manages connections    │  │
│  └───────────┬───────────────┘  │
│              │                   │
│  ┌───────────▼───────────────┐  │
│  │  ElevenLabs Service       │  │
│  │  - Streams audio chunks   │  │
│  │  - Processes responses    │  │
│  └───────────┬───────────────┘  │
└──────────────┼───────────────────┘
               │ HTTP Streaming
               │ (audio chunks)
               ▼
┌─────────────────────────────────┐
│    ElevenLabs Agent API         │
│  ┌───────────────────────────┐  │
│  │  1. Speech-to-Text (STT)  │  │
│  │  2. LLM Processing        │  │
│  │  3. Text-to-Speech (TTS)  │  │
│  └───────────────────────────┘  │
└─────────────────────────────────┘

Data Flow

1. Connection Establishment

Frontend → Backend: WebSocket connection request
Backend → Frontend: {"type": "connection_established"}
Backend → ElevenLabs: Initialize conversation
Backend → Frontend: {"type": "conversation_started", "conversation_id": "..."}

2. Audio Streaming (Frontend → ElevenLabs)

Frontend → Backend: {"type": "audio_chunk", "data": "<base64>", "format": "audio/webm"}
Backend → ElevenLabs: Stream audio chunk (HTTP streaming)
ElevenLabs: Processes audio (STT → LLM → TTS)

3. Response Streaming (ElevenLabs → Frontend)

ElevenLabs → Backend: Stream response chunks (audio/text)
Backend → Frontend: {"type": "audio_response", "data": "<base64>", "text": "...", "is_final": false}

4. Connection Cleanup

Frontend → Backend: {"type": "end_conversation"}
Backend → ElevenLabs: End conversation
Backend → Frontend: {"type": "conversation_ended"}

Key Components

1. WebSocket Handler (`src/routes/voice_router.py`)

Manages WebSocket connections
Handles bidirectional message passing
Processes audio chunks and forwards to ElevenLabs
Streams responses back to frontend

2. ElevenLabs Service (`src/services/elevenlabs_service.py`)

Manages conversation lifecycle
Streams audio chunks to ElevenLabs API
Processes streaming responses
Handles errors and reconnection

3. WebSocket Manager (`src/services/websocket_manager.py`)

Tracks active connections
Handles connection lifecycle
Provides broadcast capabilities

Message Protocol

Client → Server Messages

Audio Chunk

{
  "type": "audio_chunk",
  "data": "<base64_encoded_audio_data>",
  "format": "audio/webm"
}

End Conversation

{
  "type": "end_conversation"
}

Heartbeat

{
  "type": "heartbeat"
}

Server → Client Messages

Connection Established

{
  "type": "connection_established",
  "message": "Connected to voice streaming service"
}

Conversation Started

{
  "type": "conversation_started",
  "conversation_id": "conv_abc123"
}

Audio Response

{
  "type": "audio_response",
  "data": "<base64_encoded_audio>",
  "text": "Transcribed text from STT",
  "is_final": false
}

Text Response (if text-only)

{
  "type": "text_response",
  "text": "LLM response text",
  "is_final": true
}

Error

{
  "type": "error",
  "message": "Error description"
}

Streaming Strategy

Audio Chunking

Frontend sends audio in small chunks (e.g., 100ms chunks)
Backend buffers chunks if needed
Streams to ElevenLabs as received

Response Handling

ElevenLabs streams responses incrementally
Backend forwards chunks immediately to frontend
is_final flag indicates when response is complete

Error Handling

Connection errors: Attempt reconnection
API errors: Forward error to frontend
Timeout handling: Close connection gracefully

Performance Considerations

Low Latency: Streaming reduces end-to-end latency
Memory Efficiency: Process chunks instead of buffering entire audio
Scalability: Each WebSocket connection is independent
Error Recovery: Graceful degradation on errors

Security Considerations

API Key Protection: Store in environment variables
CORS Configuration: Restrict origins
Rate Limiting: Implement per-connection limits
Input Validation: Validate audio format and size

Future Enhancements

Connection Pooling: Reuse ElevenLabs connections
Audio Compression: Compress audio before sending
Caching: Cache common LLM responses
Monitoring: Add metrics and logging
Load Balancing: Distribute connections across instances

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture Documentation

Streaming Architecture Overview

Architecture Diagram

Data Flow

1. Connection Establishment

2. Audio Streaming (Frontend → ElevenLabs)

3. Response Streaming (ElevenLabs → Frontend)

4. Connection Cleanup

Key Components

1. WebSocket Handler (`src/routes/voice_router.py`)

2. ElevenLabs Service (`src/services/elevenlabs_service.py`)

3. WebSocket Manager (`src/services/websocket_manager.py`)

Message Protocol

Client → Server Messages

Audio Chunk

End Conversation

Heartbeat

Server → Client Messages

Connection Established

Conversation Started

Audio Response

Text Response (if text-only)

Error

Streaming Strategy

Audio Chunking

Response Handling

Error Handling

Performance Considerations

Security Considerations

Future Enhancements

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

Architecture Documentation

Streaming Architecture Overview

Architecture Diagram

Data Flow

1. Connection Establishment

2. Audio Streaming (Frontend → ElevenLabs)

3. Response Streaming (ElevenLabs → Frontend)

4. Connection Cleanup

Key Components

1. WebSocket Handler (src/routes/voice_router.py)

2. ElevenLabs Service (src/services/elevenlabs_service.py)

3. WebSocket Manager (src/services/websocket_manager.py)

Message Protocol

Client → Server Messages

Audio Chunk

End Conversation

Heartbeat

Server → Client Messages

Connection Established

Conversation Started

Audio Response

Text Response (if text-only)

Error

Streaming Strategy

Audio Chunking

Response Handling

Error Handling

Performance Considerations

Security Considerations

Future Enhancements

1. WebSocket Handler (`src/routes/voice_router.py`)

2. ElevenLabs Service (`src/services/elevenlabs_service.py`)

3. WebSocket Manager (`src/services/websocket_manager.py`)