Grand Prize Winner - SpinSci AI Hackathon 2025
Production-grade, multi-tenant AI voice agent platform for enterprise call automation. Built with Pipecat orchestration, Twilio telephony, streaming STT/TTS services, and dual LLM strategies for ultra-low latency voice interactions.
Originally prototyped in 6.5 hours during the SpinSci Hackathon 2025 at T-Hub, Hyderabad (January 14-15, 2025), this platform has been significantly scaled and enhanced into a production-ready system with comprehensive testing, security features, and enterprise-grade capabilities.
The Config-Driven Voice Agent Platform enables appointment scheduling, rescheduling, cancellations, and general call automation across different industries through configuration-driven architecture. Deploy the same codebase for healthcare clinics, automotive dealerships, or custom use cases without writing new code.
The platform addresses the enterprise telecommunications transformation from rigid IVR systems to autonomous AI agents capable of executing complex, multi-turn workflows with human-like prosody and semantic comprehension. By decoupling core conversational intelligence from domain-specific business logic, a single platform seamlessly transitions between industries simply by loading different configuration payloads.
This platform won the grand prize at the SpinSci AI Hackathon 2025, competing against 20 shortlisted teams from across India. The event took place on January 14-15, 2025, at T-Hub, Hyderabad.
The original hackathon submission was a functional MVP developed in just 6.5 hours. This repository represents a significantly enhanced and production-ready version with:
- Comprehensive test coverage including property-based testing
- Enterprise-grade security and HIPAA compliance features
- Production deployment configurations (Docker, Kubernetes, KEDA)
- Enhanced observability and monitoring capabilities
- Multiple LLM provider support (OpenAI, Ollama, vLLM)
- Extensive documentation and examples
- Scalability improvements for 1000+ concurrent calls
The team initially focused on an AI-powered platform for early detection of eye diseases but quickly pivoted to align with SpinSci Technologies' mandate for AI-powered patient engagement solutions. The result was a complete AI-powered call-based appointment management system developed as an MVP in just 6.5 hours.
The winning hackathon solution demonstrated:
- Rapid adaptation and pivot under intense pressure
- AI-driven voice calls for appointment reminders, rescheduling, and cancellations
- Intelligent rescheduling with instant booking capabilities
- EHR integration foundation for scalability
- Context-aware NLP for natural, human-like conversations
- Automation of manual processes previously handled through SMS
The MVP leveraged SpinSci's Patient Notify framework, focusing on automating patient-provider workflows through voice-enabled real-time calls using Twilio, GPT-3.5, and Ollama.
This repository extends the original MVP with:
- Multi-tenant architecture for zero-code deployment across industries
- Dual LLM strategy with local deployment options (Ollama/vLLM)
- SMART-on-FHIR integration for Epic and Cerner EHR systems
- Automotive DMS adapter for cross-industry adaptability
- Property-based testing with Hypothesis (11 correctness properties)
- HIPAA compliance with zero-trust security architecture
- Comprehensive observability (LangSmith, Helicone, Twilio CI)
- Production deployment with auto-scaling (KEDA)
- Voice synthesis optimization with ElevenLabs Flash v2.5
- Latency optimization achieving <1000ms median response time
- Multi-tenant architecture with zero-code deployment for new use cases
- Ultra-low latency voice pipeline achieving median response times under 1000ms (target: 500ms)
- HIPAA-compliant security with zero-trust architecture, TLS 1.2+ encryption, and AES-256 data protection
- Dual LLM strategy supporting cloud providers (OpenAI) or local deployment (Ollama/vLLM) for data sovereignty
- Industry-specific adapters for healthcare (SMART-on-FHIR) and automotive (DMS) systems
- Production-ready deployment with Docker, Kubernetes, and KEDA-based GPU auto-scaling
- Comprehensive observability with LangSmith/Helicone integration and per-stage latency tracking
- Property-based testing for correctness validation across all components
- Empathetic voice synthesis with SSML/audio tags for emotional context
- Voice-optimized prompt engineering for natural conversational flow
- Full-duplex communication with barge-in capability for natural turn-taking
- Streaming STT/TTS processing for concurrent pipeline execution
- Handles 1000+ concurrent calls with 99.9% uptime target
- Appointment scheduling, rescheduling, and cancellations
- Patient reminders and follow-up calls
- Prescription refill automation
- Post-discharge check-ins
- Clinical trial recruitment
- Insurance verification calls
- Service appointment scheduling
- Maintenance reminders
- Emergency roadside assistance triage
- Parts availability inquiries
- Service quote generation
- Loaner vehicle coordination
- Customer service automation
- Lead qualification calls
- Survey and feedback collection
- Payment reminders
- Event registration and confirmations
- General information hotlines
The platform's configuration-driven architecture means you can deploy any of these use cases by simply creating a new tenant configuration file - no code changes required.
The platform uses a streaming architecture with concurrent processing stages to achieve ultra-low latency voice interactions.
graph LR
%% ================= LAYERS =================
subgraph Telephony
A[Twilio<br/>WebSocket / WebRTC]
end
subgraph Orchestration
B[Pipecat<br/>Orchestrator]
C[Tenant Registry]
D[Session Manager]
end
subgraph "Voice AI Pipeline"
E[VAD · Silero]
F[STT · Deepgram]
G[LLM · OpenAI / Ollama / vLLM]
H[TTS · ElevenLabs / OpenAI]
end
subgraph Tools
I[Tool Executor]
J[FHIR Adapter]
K[DMS Adapter]
L[Custom APIs]
end
subgraph "Security & Observability"
M[OAuth2 Auth]
N[AES-256 / TLS]
O[Audit Logs]
P[Latency Monitor]
Q[LangSmith / Helicone]
end
%% ================= FLOW =================
A --> B
B --> E
E --> F
F --> G
G --> H
H --> A
B --> C
B --> D
G --> I
I --> J
I --> K
I --> L
J --> G
K --> G
L --> G
B -.-> M
B -.-> N
B -.-> O
B -.-> P
G -.-> Q
%% ================= DARK COLORS =================
%% Telephony — Blue
style A fill:#0d47a1,color:#ffffff,stroke:#42a5f5,stroke-width:2px
%% Orchestration — Purple
style B fill:#4a148c,color:#ffffff,stroke:#ba68c8,stroke-width:2px
style C fill:#4a148c,color:#ffffff,stroke:#ba68c8
style D fill:#4a148c,color:#ffffff,stroke:#ba68c8
%% Voice AI — Pink core highlight
style E fill:#880e4f,color:#ffffff,stroke:#ff80ab
style F fill:#880e4f,color:#ffffff,stroke:#ff80ab
style G fill:#ad1457,color:#ffffff,stroke:#ff4081,stroke-width:3px
style H fill:#880e4f,color:#ffffff,stroke:#ff80ab
%% Tools — Green
style I fill:#1b5e20,color:#ffffff,stroke:#69f0ae,stroke-width:2px
style J fill:#1b5e20,color:#ffffff,stroke:#69f0ae
style K fill:#1b5e20,color:#ffffff,stroke:#69f0ae
style L fill:#1b5e20,color:#ffffff,stroke:#69f0ae
%% Security — Red
style M fill:#7f0000,color:#ffffff,stroke:#ff5252
style N fill:#7f0000,color:#ffffff,stroke:#ff5252
style O fill:#7f0000,color:#ffffff,stroke:#ff5252
style P fill:#7f0000,color:#ffffff,stroke:#ff5252
style Q fill:#7f0000,color:#ffffff,stroke:#ff5252
- Incoming call arrives via Twilio WebSocket/WebRTC
- Pipecat Orchestrator routes call to appropriate tenant configuration based on caller ID
- Session Manager creates new conversation session with tenant-specific settings
- Voice Pipeline processes audio in real-time with concurrent stages:
- VAD (Silero) detects speech boundaries for natural turn-taking
- STT (Deepgram Nova) transcribes audio to text with streaming interim results
- LLM (OpenAI/Ollama/vLLM) processes transcript, classifies intent, extracts parameters
- TTS (ElevenLabs Flash v2.5/OpenAI) converts response to audio with emotional context
- Tool Executor handles external API calls when LLM determines tool usage needed
- Security services encrypt data (AES-256) and authenticate API requests (OAuth 2.0)
- Monitoring services track per-stage latency and log all operations immutably
The platform achieves ultra-low latency through aggressive optimization at each pipeline stage:
| Pipeline Stage | Target Latency | Upper Limit | Optimization Strategy |
|---|---|---|---|
| Media Edge/Network | 70ms | 100ms | Geographic edge server deployment |
| Speech-to-Text | 350ms | 500ms | Deepgram Nova streaming STT |
| LLM TTFT | 375ms | 750ms | Speculative decoding, quantization for local models |
| TTS TTFB | 100ms | 250ms | ElevenLabs Flash v2.5 ultra-low latency |
| Total End-to-End | ~895ms | ~1,600ms | Concurrent processing, streaming audio generation |
Key optimizations include:
- Concurrent processing: LLM begins processing as soon as STT provides interim results
- Streaming TTS: Audio generation starts before LLM completes full response
- Token streaming: TTS receives tokens as they're generated, not after completion
- Full-duplex WebSocket/WebRTC for simultaneous send/receive
Core components include:
- Tenant Registry for multi-tenant configuration management
- Session Manager for conversation state tracking
- Voice Pipeline for real-time audio processing
- Authentication and Encryption Services for security
- Audit Logger for compliance and debugging
- Latency Monitor for performance tracking
This platform evolved from a 6.5-hour hackathon MVP to a production-ready system. The quick start guide below demonstrates the enhanced capabilities built on top of the original concept.
- Python 3.10 or higher
- Virtual environment tool (venv, virtualenv, or conda)
- API keys for external services (Deepgram, OpenAI/Ollama, ElevenLabs, Twilio)
- Clone the repository:
git clone <repository-url>
cd config-driven-voice-agent- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -e ".[dev]"- Configure environment variables:
cp .env.example .env
# Edit .env with your API keys and configuration- Run the test suite:
pytestRun the example appointment scheduling workflow:
python examples/appointment_scheduling_workflow_usage.pyThis demonstrates the core functionality that won the hackathon: AI-powered voice calls for appointment management with natural language understanding.
See the platform in action with demo videos in the demo/ directory. Check out docs/DEMOS.md for details on the available demonstrations.
Tenant configurations are defined in YAML or TOML files located in config/tenants/. Each tenant configuration specifies:
- Persona and system prompt with voice-optimized instructions
- LLM provider and model settings (OpenAI/Ollama/vLLM)
- TTS provider and voice settings (ElevenLabs/OpenAI)
- Available tools and API endpoints (FHIR/DMS/Custom)
- Knowledge bases for domain-specific information
- Operational constraints (max call duration, response brevity)
The platform's core innovation is its ability to serve multiple industries without code changes. The Tenant Registry loads configurations at runtime and routes calls based on caller identification. Hot-reload capability allows configuration updates without system restart.
Example tenant configuration:
tenant_id: healthcare_clinic_001
persona:
name: "Sarah"
voice_id: "elevenlabs_voice_id"
system_prompt: "You are Sarah, a friendly medical receptionist..."
llm:
provider: "openai" # or "ollama", "vllm"
model: "gpt-4"
temperature: 0.7
max_tokens: 150
tools:
- name: "search_appointments"
type: "fhir"
endpoint: "https://fhir.epic.com/api/FHIR/R4"
auth_type: "oauth2"
scopes: ["patient/Appointment.read"]
constraints:
max_call_duration: 600
response_brevity: 3 # max sentences per utteranceFor healthcare deployments, the platform integrates with Epic and Cerner EHR systems using the HL7 FHIR standard:
- OAuth 2.0 authentication with scoped access tokens
- Appointment/$find operation for searching available slots
- Appointment/$book operation for creating appointments
- Constraint checking (is-cancelable, is-reschedulable flags)
- Calendar optimization heuristics to minimize gaps and maximize provider utilization
For automotive deployments, the platform connects to Dealership Management Systems:
- VIN-based vehicle identification
- Service slot search and booking
- Maintenance quote retrieval
- Emergency triage for breakdown scenarios
- Parts availability checking
See config/tenants/example_healthcare.yaml and config/tenants/example_automotive.yaml for complete examples.
config-driven-voice-agent/
├── src/voice_agent/
│ ├── core/ # Core orchestration and pipeline components
│ ├── services/ # External service integrations (STT, LLM, TTS, VAD)
│ ├── adapters/ # Industry-specific adapters (FHIR, DMS)
│ ├── security/ # Authentication and encryption services
│ ├── config/ # Configuration management and validation
│ ├── monitoring/ # Observability, logging, and latency tracking
│ ├── telephony/ # Twilio integration
│ ├── workflows/ # Conversation workflow implementations
│ └── deployment/ # Deployment and scaling controllers
├── tests/ # Comprehensive test suite with property-based tests
├── config/tenants/ # Tenant configuration files
├── docs/ # Documentation and guides
├── k8s/ # Kubernetes deployment manifests
├── examples/ # Usage examples for each component
└── pyproject.toml # Project configuration and dependencies
Run the full test suite:
pytestRun tests with coverage report:
pytest --cov=src --cov-report=htmlRun property-based tests only:
pytest -m propertyFormat code with Black:
black src testsLint code with Ruff:
ruff check src testsType checking with mypy:
mypy srcThe platform uses a comprehensive testing approach:
- Unit tests for individual components and functions
- Property-based tests using Hypothesis for universal correctness properties
- Integration tests for component interactions
- End-to-end tests for complete call flows
- Load tests for performance validation
All tests are located in the tests/ directory with a structure mirroring the source code.
The platform prioritizes natural, empathetic voice interactions through advanced TTS and prompt engineering:
- ElevenLabs Flash v2.5 (primary): 75ms latency, 5000+ voices, voice cloning support
- OpenAI TTS (fallback): 200ms latency, reliable alternative
- SSML and audio tag support for emotional context: [sigh], [laughs],
- Prosody modulation based on conversation context
- Pronunciation accuracy optimized for medical and technical terminology
The platform uses specialized prompt architecture to ensure responses are optimized for voice-only consumption:
- No markdown formatting, tables, or bullet points in responses
- Responses limited to 3 sentences per utterance for natural pacing
- Empathetic phrasing with natural inflections
- Explicit knowledge boundary statements (no hallucinations)
- Conversational language appropriate for spoken dialogue
- Emotional context tags for appropriate tone modulation
Pre-built workflows for common use cases:
- Appointment scheduling with multi-turn parameter extraction
- Appointment rescheduling with constraint checking
- Appointment cancellation with confirmation
- Human handoff when workflows fail or user requests escalation
- Error handling with graceful degradation and helpful feedback
Build the Docker image:
docker build -t voice-agent-platform .Run with Docker Compose:
docker-compose upDeploy to Kubernetes:
kubectl apply -f k8s/The platform includes:
- Deployment manifests with resource limits
- Service definitions for load balancing
- ConfigMaps for environment configuration
- Secrets for sensitive credentials
- KEDA ScaledObject for GPU auto-scaling
See docs/DEPLOYMENT.md for detailed deployment instructions.
The platform implements enterprise-grade security with zero-trust architecture:
- Zero-trust architecture with OAuth 2.0 authentication for all external APIs
- TLS 1.2+ for all network communication (WebSocket/WebRTC)
- AES-256-GCM encryption for data at rest
- PII/PHI tokenization for privacy protection before logging
- Immutable audit logs for compliance tracking and forensic analysis
- Business Associate Agreements (BAA) support for HIPAA compliance
- Data minimization principles: process PHI only for immediate workflow execution
For healthcare deployments, the platform meets HIPAA requirements through:
- Execution of Business Associate Agreements with all third-party vendors (Twilio, OpenAI, Deepgram, ElevenLabs)
- Local LLM deployment option (Ollama/vLLM) for absolute data sovereignty
- Automated PII/PHI redaction before external API calls
- Immutable audit logs mapping every API call, database query, and state change
- Encrypted storage and transmission of all protected health information
For data sovereignty requirements or environments where BAAs are insufficient, deploy with local LLM providers (Ollama or vLLM) to ensure all data remains on-premises and never traverses the public internet.
The platform is optimized for ultra-low latency and high concurrency:
- Median end-to-end latency: under 1000ms
- Target latency: under 500ms for optimal user experience
- Concurrent processing stages (STT, LLM, TTS) to minimize total response time
- Streaming audio generation to overlap LLM inference and TTS synthesis
- Token caching and automatic refresh to eliminate authentication delays
- Handles 1000+ concurrent calls
- 99.9% uptime target for production deployments
- KEDA-based auto-scaling for GPU resources (local LLM deployments)
- Dynamic resource allocation based on call volume patterns
- Cost optimization through balanced cloud API and self-hosted inference
Per-stage metrics captured for every interaction:
- STT latency: Audio chunk to transcript
- LLM Time To First Token (TTFT): Transcript to first LLM token
- LLM Total: Complete response generation
- TTS Time To First Byte (TTFB): Text to first audio byte
- TTS Total: Complete audio generation
- End-to-end: Audio input to audio output
For self-hosted deployments using Ollama or vLLM:
- Quantization (4-bit) to reduce GPU memory footprint and accelerate inference
- Speculative decoding for massive speed improvements in token generation
- PagedAttention and continuous batching (vLLM) for high-concurrency scenarios
- GPU auto-scaling to balance cost and performance
Integrated monitoring and tracing for production deployments:
- LangSmith or Helicone for end-to-end trace captures of every user interaction
- Token consumption metrics and cost analysis
- Hallucination detection and dialogue quality monitoring
- Precise system prompt and response logging for debugging
- Twilio Conversational Intelligence for sentiment analysis
- Post-call language operators for objective achievement verification
- Conversation quality metrics and compliance protocol validation
- Prometheus metrics for system health and resource utilization
- Grafana dashboards for real-time visualization
- Sentry for error tracking and alerting
- Per-session latency metrics with stage-by-stage breakdown
- Alert generation for threshold violations (latency, error rates, hallucinations)
- Immutable audit logs for all system operations
- Call initiation and termination events
- Transcript segments with PII tokenization
- LLM requests and responses
- Tool executions and results
- Authentication events and API calls
- Error conditions and latency anomalies
We welcome contributions from the community! This project started as a hackathon winner and has evolved into a production-ready platform. Whether you're fixing bugs, adding features, improving documentation, or sharing ideas, your contributions are valued.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Make your changes and add tests
- Ensure all tests pass (
pytest) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow PEP 8 style guidelines (use
blackfor formatting) - Write tests for new features
- Update documentation as needed
- Keep commits atomic and well-described
- Ensure backward compatibility when possible
- Additional industry adapters (finance, retail, education)
- Enhanced voice synthesis providers
- Improved latency optimization techniques
- Additional language support for international deployments
- Enhanced observability and monitoring features
- Documentation improvements and tutorials
- Bug fixes and performance optimizations
- Report bugs and request features through GitHub Issues
- Join discussions in GitHub Discussions
- Share your use cases and implementations
- Help others in the community
This project would not have been possible without:
- SpinSci Technologies for hosting the hackathon and providing the problem statement
- T-Hub, Hyderabad for providing the venue and infrastructure
- The open-source community for the amazing tools and libraries we built upon
Copyright (c) 2025 SpinSci AI Hackathon 2025 VisioneoAI Innovators Team
Licensed under the MIT License. See LICENSE file for details.
This software is open source and free to use, modify, and distribute under the terms of the MIT License.