Config-Driven Voice Agent Platform

Grand Prize Winner - SpinSci AI Hackathon 2025

Production-grade, multi-tenant AI voice agent platform for enterprise call automation. Built with Pipecat orchestration, Twilio telephony, streaming STT/TTS services, and dual LLM strategies for ultra-low latency voice interactions.

Originally prototyped in 6.5 hours during the SpinSci Hackathon 2025 at T-Hub, Hyderabad (January 14-15, 2025), this platform has been significantly scaled and enhanced into a production-ready system with comprehensive testing, security features, and enterprise-grade capabilities.

Overview

The Config-Driven Voice Agent Platform enables appointment scheduling, rescheduling, cancellations, and general call automation across different industries through configuration-driven architecture. Deploy the same codebase for healthcare clinics, automotive dealerships, or custom use cases without writing new code.

The platform addresses the enterprise telecommunications transformation from rigid IVR systems to autonomous AI agents capable of executing complex, multi-turn workflows with human-like prosody and semantic comprehension. By decoupling core conversational intelligence from domain-specific business logic, a single platform seamlessly transitions between industries simply by loading different configuration payloads.

Hackathon

This platform won the grand prize at the SpinSci AI Hackathon 2025, competing against 20 shortlisted teams from across India. The event took place on January 14-15, 2025, at T-Hub, Hyderabad.

From MVP to Production

The original hackathon submission was a functional MVP developed in just 6.5 hours. This repository represents a significantly enhanced and production-ready version with:

Comprehensive test coverage including property-based testing
Enterprise-grade security and HIPAA compliance features
Production deployment configurations (Docker, Kubernetes, KEDA)
Enhanced observability and monitoring capabilities
Multiple LLM provider support (OpenAI, Ollama, vLLM)
Extensive documentation and examples
Scalability improvements for 1000+ concurrent calls

The Challenge

The team initially focused on an AI-powered platform for early detection of eye diseases but quickly pivoted to align with SpinSci Technologies' mandate for AI-powered patient engagement solutions. The result was a complete AI-powered call-based appointment management system developed as an MVP in just 6.5 hours.

Original MVP Features

The winning hackathon solution demonstrated:

Rapid adaptation and pivot under intense pressure
AI-driven voice calls for appointment reminders, rescheduling, and cancellations
Intelligent rescheduling with instant booking capabilities
EHR integration foundation for scalability
Context-aware NLP for natural, human-like conversations
Automation of manual processes previously handled through SMS

The MVP leveraged SpinSci's Patient Notify framework, focusing on automating patient-provider workflows through voice-enabled real-time calls using Twilio, GPT-3.5, and Ollama.

Production Enhancements

This repository extends the original MVP with:

Multi-tenant architecture for zero-code deployment across industries
Dual LLM strategy with local deployment options (Ollama/vLLM)
SMART-on-FHIR integration for Epic and Cerner EHR systems
Automotive DMS adapter for cross-industry adaptability
Property-based testing with Hypothesis (11 correctness properties)
HIPAA compliance with zero-trust security architecture
Comprehensive observability (LangSmith, Helicone, Twilio CI)
Production deployment with auto-scaling (KEDA)
Voice synthesis optimization with ElevenLabs Flash v2.5
Latency optimization achieving <1000ms median response time

Key Features

Multi-tenant architecture with zero-code deployment for new use cases
Ultra-low latency voice pipeline achieving median response times under 1000ms (target: 500ms)
HIPAA-compliant security with zero-trust architecture, TLS 1.2+ encryption, and AES-256 data protection
Dual LLM strategy supporting cloud providers (OpenAI) or local deployment (Ollama/vLLM) for data sovereignty
Industry-specific adapters for healthcare (SMART-on-FHIR) and automotive (DMS) systems
Production-ready deployment with Docker, Kubernetes, and KEDA-based GPU auto-scaling
Comprehensive observability with LangSmith/Helicone integration and per-stage latency tracking
Property-based testing for correctness validation across all components
Empathetic voice synthesis with SSML/audio tags for emotional context
Voice-optimized prompt engineering for natural conversational flow
Full-duplex communication with barge-in capability for natural turn-taking
Streaming STT/TTS processing for concurrent pipeline execution
Handles 1000+ concurrent calls with 99.9% uptime target

Use Cases

Healthcare

Appointment scheduling, rescheduling, and cancellations
Patient reminders and follow-up calls
Prescription refill automation
Post-discharge check-ins
Clinical trial recruitment
Insurance verification calls

Automotive

Service appointment scheduling
Maintenance reminders
Emergency roadside assistance triage
Parts availability inquiries
Service quote generation
Loaner vehicle coordination

General Enterprise

Customer service automation
Lead qualification calls
Survey and feedback collection
Payment reminders
Event registration and confirmations
General information hotlines

The platform's configuration-driven architecture means you can deploy any of these use cases by simply creating a new tenant configuration file - no code changes required.

Architecture

The platform uses a streaming architecture with concurrent processing stages to achieve ultra-low latency voice interactions.

graph LR

%% ================= LAYERS =================

subgraph Telephony
A[Twilio<br/>WebSocket / WebRTC]
end

subgraph Orchestration
B[Pipecat<br/>Orchestrator]
C[Tenant Registry]
D[Session Manager]
end

subgraph "Voice AI Pipeline"
E[VAD · Silero]
F[STT · Deepgram]
G[LLM · OpenAI / Ollama / vLLM]
H[TTS · ElevenLabs / OpenAI]
end

subgraph Tools
I[Tool Executor]
J[FHIR Adapter]
K[DMS Adapter]
L[Custom APIs]
end

subgraph "Security & Observability"
M[OAuth2 Auth]
N[AES-256 / TLS]
O[Audit Logs]
P[Latency Monitor]
Q[LangSmith / Helicone]
end


%% ================= FLOW =================

A --> B
B --> E
E --> F
F --> G
G --> H
H --> A

B --> C
B --> D

G --> I
I --> J
I --> K
I --> L
J --> G
K --> G
L --> G

B -.-> M
B -.-> N
B -.-> O
B -.-> P
G -.-> Q


%% ================= DARK COLORS =================

%% Telephony — Blue
style A fill:#0d47a1,color:#ffffff,stroke:#42a5f5,stroke-width:2px

%% Orchestration — Purple
style B fill:#4a148c,color:#ffffff,stroke:#ba68c8,stroke-width:2px
style C fill:#4a148c,color:#ffffff,stroke:#ba68c8
style D fill:#4a148c,color:#ffffff,stroke:#ba68c8

%% Voice AI — Pink core highlight
style E fill:#880e4f,color:#ffffff,stroke:#ff80ab
style F fill:#880e4f,color:#ffffff,stroke:#ff80ab
style G fill:#ad1457,color:#ffffff,stroke:#ff4081,stroke-width:3px
style H fill:#880e4f,color:#ffffff,stroke:#ff80ab

%% Tools — Green
style I fill:#1b5e20,color:#ffffff,stroke:#69f0ae,stroke-width:2px
style J fill:#1b5e20,color:#ffffff,stroke:#69f0ae
style K fill:#1b5e20,color:#ffffff,stroke:#69f0ae
style L fill:#1b5e20,color:#ffffff,stroke:#69f0ae

%% Security — Red
style M fill:#7f0000,color:#ffffff,stroke:#ff5252
style N fill:#7f0000,color:#ffffff,stroke:#ff5252
style O fill:#7f0000,color:#ffffff,stroke:#ff5252
style P fill:#7f0000,color:#ffffff,stroke:#ff5252
style Q fill:#7f0000,color:#ffffff,stroke:#ff5252

Data Flow

Incoming call arrives via Twilio WebSocket/WebRTC
Pipecat Orchestrator routes call to appropriate tenant configuration based on caller ID
Session Manager creates new conversation session with tenant-specific settings
Voice Pipeline processes audio in real-time with concurrent stages:
- VAD (Silero) detects speech boundaries for natural turn-taking
- STT (Deepgram Nova) transcribes audio to text with streaming interim results
- LLM (OpenAI/Ollama/vLLM) processes transcript, classifies intent, extracts parameters
- TTS (ElevenLabs Flash v2.5/OpenAI) converts response to audio with emotional context
Tool Executor handles external API calls when LLM determines tool usage needed
Security services encrypt data (AES-256) and authenticate API requests (OAuth 2.0)
Monitoring services track per-stage latency and log all operations immutably

Latency Optimization

The platform achieves ultra-low latency through aggressive optimization at each pipeline stage:

Pipeline Stage	Target Latency	Upper Limit	Optimization Strategy
Media Edge/Network	70ms	100ms	Geographic edge server deployment
Speech-to-Text	350ms	500ms	Deepgram Nova streaming STT
LLM TTFT	375ms	750ms	Speculative decoding, quantization for local models
TTS TTFB	100ms	250ms	ElevenLabs Flash v2.5 ultra-low latency
Total End-to-End	~895ms	~1,600ms	Concurrent processing, streaming audio generation

Key optimizations include:

Concurrent processing: LLM begins processing as soon as STT provides interim results
Streaming TTS: Audio generation starts before LLM completes full response
Token streaming: TTS receives tokens as they're generated, not after completion
Full-duplex WebSocket/WebRTC for simultaneous send/receive

Core components include:

Tenant Registry for multi-tenant configuration management
Session Manager for conversation state tracking
Voice Pipeline for real-time audio processing
Authentication and Encryption Services for security
Audit Logger for compliance and debugging
Latency Monitor for performance tracking

Quick Start

From Hackathon to Production

This platform evolved from a 6.5-hour hackathon MVP to a production-ready system. The quick start guide below demonstrates the enhanced capabilities built on top of the original concept.

Prerequisites

Python 3.10 or higher
Virtual environment tool (venv, virtualenv, or conda)
API keys for external services (Deepgram, OpenAI/Ollama, ElevenLabs, Twilio)

Installation

Clone the repository:

git clone <repository-url>
cd config-driven-voice-agent

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -e ".[dev]"

Configure environment variables:

cp .env.example .env
# Edit .env with your API keys and configuration

Run the test suite:

pytest

Quick Demo

Run the example appointment scheduling workflow:

python examples/appointment_scheduling_workflow_usage.py

This demonstrates the core functionality that won the hackathon: AI-powered voice calls for appointment management with natural language understanding.

Watch the Demos

See the platform in action with demo videos in the demo/ directory. Check out docs/DEMOS.md for details on the available demonstrations.

Configuration

Tenant configurations are defined in YAML or TOML files located in config/tenants/. Each tenant configuration specifies:

Persona and system prompt with voice-optimized instructions
LLM provider and model settings (OpenAI/Ollama/vLLM)
TTS provider and voice settings (ElevenLabs/OpenAI)
Available tools and API endpoints (FHIR/DMS/Custom)
Knowledge bases for domain-specific information
Operational constraints (max call duration, response brevity)

Configuration-Driven Multi-Tenancy

The platform's core innovation is its ability to serve multiple industries without code changes. The Tenant Registry loads configurations at runtime and routes calls based on caller identification. Hot-reload capability allows configuration updates without system restart.

Example tenant configuration:

tenant_id: healthcare_clinic_001
persona:
  name: "Sarah"
  voice_id: "elevenlabs_voice_id"
  system_prompt: "You are Sarah, a friendly medical receptionist..."
llm:
  provider: "openai"  # or "ollama", "vllm"
  model: "gpt-4"
  temperature: 0.7
  max_tokens: 150
tools:
  - name: "search_appointments"
    type: "fhir"
    endpoint: "https://fhir.epic.com/api/FHIR/R4"
    auth_type: "oauth2"
    scopes: ["patient/Appointment.read"]
constraints:
  max_call_duration: 600
  response_brevity: 3  # max sentences per utterance

Healthcare Integration (SMART-on-FHIR)

For healthcare deployments, the platform integrates with Epic and Cerner EHR systems using the HL7 FHIR standard:

OAuth 2.0 authentication with scoped access tokens
Appointment/$find operation for searching available slots
Appointment/$book operation for creating appointments
Constraint checking (is-cancelable, is-reschedulable flags)
Calendar optimization heuristics to minimize gaps and maximize provider utilization

Automotive Integration (DMS)

For automotive deployments, the platform connects to Dealership Management Systems:

VIN-based vehicle identification
Service slot search and booking
Maintenance quote retrieval
Emergency triage for breakdown scenarios
Parts availability checking

See config/tenants/example_healthcare.yaml and config/tenants/example_automotive.yaml for complete examples.

Project Structure

config-driven-voice-agent/
├── src/voice_agent/
│   ├── core/              # Core orchestration and pipeline components
│   ├── services/          # External service integrations (STT, LLM, TTS, VAD)
│   ├── adapters/          # Industry-specific adapters (FHIR, DMS)
│   ├── security/          # Authentication and encryption services
│   ├── config/            # Configuration management and validation
│   ├── monitoring/        # Observability, logging, and latency tracking
│   ├── telephony/         # Twilio integration
│   ├── workflows/         # Conversation workflow implementations
│   └── deployment/        # Deployment and scaling controllers
├── tests/                 # Comprehensive test suite with property-based tests
├── config/tenants/        # Tenant configuration files
├── docs/                  # Documentation and guides
├── k8s/                   # Kubernetes deployment manifests
├── examples/              # Usage examples for each component
└── pyproject.toml         # Project configuration and dependencies

Development

Running Tests

Run the full test suite:

pytest

Run tests with coverage report:

pytest --cov=src --cov-report=html

Run property-based tests only:

pytest -m property

Code Quality

Format code with Black:

black src tests

Lint code with Ruff:

ruff check src tests

Type checking with mypy:

mypy src

Testing Strategy

The platform uses a comprehensive testing approach:

Unit tests for individual components and functions
Property-based tests using Hypothesis for universal correctness properties
Integration tests for component interactions
End-to-end tests for complete call flows
Load tests for performance validation

All tests are located in the tests/ directory with a structure mirroring the source code.

Voice Synthesis and Empathy

The platform prioritizes natural, empathetic voice interactions through advanced TTS and prompt engineering:

Text-to-Speech Providers

ElevenLabs Flash v2.5 (primary): 75ms latency, 5000+ voices, voice cloning support
OpenAI TTS (fallback): 200ms latency, reliable alternative
SSML and audio tag support for emotional context: [sigh], [laughs],
Prosody modulation based on conversation context
Pronunciation accuracy optimized for medical and technical terminology

Voice-Optimized Prompt Engineering

The platform uses specialized prompt architecture to ensure responses are optimized for voice-only consumption:

No markdown formatting, tables, or bullet points in responses
Responses limited to 3 sentences per utterance for natural pacing
Empathetic phrasing with natural inflections
Explicit knowledge boundary statements (no hallucinations)
Conversational language appropriate for spoken dialogue
Emotional context tags for appropriate tone modulation

Conversational Workflows

Pre-built workflows for common use cases:

Appointment scheduling with multi-turn parameter extraction
Appointment rescheduling with constraint checking
Appointment cancellation with confirmation
Human handoff when workflows fail or user requests escalation
Error handling with graceful degradation and helpful feedback

Deployment

Docker

Build the Docker image:

docker build -t voice-agent-platform .

Run with Docker Compose:

docker-compose up

Kubernetes

Deploy to Kubernetes:

kubectl apply -f k8s/

The platform includes:

Deployment manifests with resource limits
Service definitions for load balancing
ConfigMaps for environment configuration
Secrets for sensitive credentials
KEDA ScaledObject for GPU auto-scaling

See docs/DEPLOYMENT.md for detailed deployment instructions.

Security and Compliance

The platform implements enterprise-grade security with zero-trust architecture:

Zero-trust architecture with OAuth 2.0 authentication for all external APIs
TLS 1.2+ for all network communication (WebSocket/WebRTC)
AES-256-GCM encryption for data at rest
PII/PHI tokenization for privacy protection before logging
Immutable audit logs for compliance tracking and forensic analysis
Business Associate Agreements (BAA) support for HIPAA compliance
Data minimization principles: process PHI only for immediate workflow execution

HIPAA Compliance

For healthcare deployments, the platform meets HIPAA requirements through:

Execution of Business Associate Agreements with all third-party vendors (Twilio, OpenAI, Deepgram, ElevenLabs)
Local LLM deployment option (Ollama/vLLM) for absolute data sovereignty
Automated PII/PHI redaction before external API calls
Immutable audit logs mapping every API call, database query, and state change
Encrypted storage and transmission of all protected health information

For data sovereignty requirements or environments where BAAs are insufficient, deploy with local LLM providers (Ollama or vLLM) to ensure all data remains on-premises and never traverses the public internet.

Performance

The platform is optimized for ultra-low latency and high concurrency:

Latency Targets

Median end-to-end latency: under 1000ms
Target latency: under 500ms for optimal user experience
Concurrent processing stages (STT, LLM, TTS) to minimize total response time
Streaming audio generation to overlap LLM inference and TTS synthesis
Token caching and automatic refresh to eliminate authentication delays

Scalability

Handles 1000+ concurrent calls
99.9% uptime target for production deployments
KEDA-based auto-scaling for GPU resources (local LLM deployments)
Dynamic resource allocation based on call volume patterns
Cost optimization through balanced cloud API and self-hosted inference

Latency Tracking

Per-stage metrics captured for every interaction:

STT latency: Audio chunk to transcript
LLM Time To First Token (TTFT): Transcript to first LLM token
LLM Total: Complete response generation
TTS Time To First Byte (TTFB): Text to first audio byte
TTS Total: Complete audio generation
End-to-end: Audio input to audio output

Local LLM Optimization

For self-hosted deployments using Ollama or vLLM:

Quantization (4-bit) to reduce GPU memory footprint and accelerate inference
Speculative decoding for massive speed improvements in token generation
PagedAttention and continuous batching (vLLM) for high-concurrency scenarios
GPU auto-scaling to balance cost and performance

Observability

Integrated monitoring and tracing for production deployments:

LLM Observability

LangSmith or Helicone for end-to-end trace captures of every user interaction
Token consumption metrics and cost analysis
Hallucination detection and dialogue quality monitoring
Precise system prompt and response logging for debugging

Telephony Intelligence

Twilio Conversational Intelligence for sentiment analysis
Post-call language operators for objective achievement verification
Conversation quality metrics and compliance protocol validation

System Monitoring

Prometheus metrics for system health and resource utilization
Grafana dashboards for real-time visualization
Sentry for error tracking and alerting
Per-session latency metrics with stage-by-stage breakdown
Alert generation for threshold violations (latency, error rates, hallucinations)

Audit Logging

Immutable audit logs for all system operations
Call initiation and termination events
Transcript segments with PII tokenization
LLM requests and responses
Tool executions and results
Authentication events and API calls
Error conditions and latency anomalies

Contributing

We welcome contributions from the community! This project started as a hackathon winner and has evolved into a production-ready platform. Whether you're fixing bugs, adding features, improving documentation, or sharing ideas, your contributions are valued.

How to Contribute

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Make your changes and add tests
Ensure all tests pass (pytest)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Development Guidelines

Follow PEP 8 style guidelines (use black for formatting)
Write tests for new features
Update documentation as needed
Keep commits atomic and well-described
Ensure backward compatibility when possible

Areas for Contribution

Additional industry adapters (finance, retail, education)
Enhanced voice synthesis providers
Improved latency optimization techniques
Additional language support for international deployments
Enhanced observability and monitoring features
Documentation improvements and tutorials
Bug fixes and performance optimizations

Community

Report bugs and request features through GitHub Issues
Join discussions in GitHub Discussions
Share your use cases and implementations
Help others in the community

Acknowledgments

This project would not have been possible without:

SpinSci Technologies for hosting the hackathon and providing the problem statement
T-Hub, Hyderabad for providing the venue and infrastructure
The open-source community for the amazing tools and libraries we built upon

License

Licensed under the MIT License. See LICENSE file for details.

This software is open source and free to use, modify, and distribute under the terms of the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
config/tenants		config/tenants
demo		demo
docs		docs
examples		examples
k8s		k8s
src/voice_agent		src/voice_agent
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SETUP_NOTES.md		SETUP_NOTES.md
coverage.xml		coverage.xml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.sh		setup.sh

Folders and files

Latest commit

History

Repository files navigation

Config-Driven Voice Agent Platform

Overview

Hackathon

From MVP to Production

The Challenge

Original MVP Features

Production Enhancements

Key Features

Use Cases

Healthcare

Automotive

General Enterprise

Architecture

Data Flow

Latency Optimization

Quick Start

From Hackathon to Production

Prerequisites

Installation

Quick Demo

Watch the Demos

Configuration

Configuration-Driven Multi-Tenancy

Healthcare Integration (SMART-on-FHIR)

Automotive Integration (DMS)

Project Structure

Development

Running Tests

Code Quality

Testing Strategy

Voice Synthesis and Empathy

Text-to-Speech Providers

Voice-Optimized Prompt Engineering

Conversational Workflows

Deployment

Docker

Kubernetes

Security and Compliance

HIPAA Compliance

Performance

Latency Targets

Scalability

Latency Tracking

Local LLM Optimization

Observability

LLM Observability

Telephony Intelligence

System Monitoring

Audit Logging

Contributing

How to Contribute

Development Guidelines

Areas for Contribution

Community

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages