Motivation
ElatoAI runs realtime voice AI on ESP32. SenseVoice (8K+ stars) is available through Sherpa-ONNX which already supports embedded/IoT platforms including ESP32-compatible inference.
Why SenseVoice for ElatoAI
- Non-autoregressive: Single forward pass, minimal compute per chunk
- SenseVoice-Small (234M): 50+ languages with auto detection
- ONNX format: Runs via Sherpa-ONNX on embedded devices
- Built-in VAD: No separate voice activity detection needed
- Emotion detection: Detect user emotions from speech — useful for companion AI
Integration via Sherpa-ONNX
Sherpa-ONNX provides C/C++ API suitable for embedded:
// C API for embedded devices
SherpaOnnxOfflineRecognizer *recognizer =
SherpaOnnxCreateOfflineRecognizer(&config);
// Process audio frames → get text
For server-side processing (when ESP32 sends audio to a server):
pip install funasr vllm
funasr-server --device cuda # OpenAI-compatible at :8000
References
Motivation
ElatoAI runs realtime voice AI on ESP32. SenseVoice (8K+ stars) is available through Sherpa-ONNX which already supports embedded/IoT platforms including ESP32-compatible inference.
Why SenseVoice for ElatoAI
Integration via Sherpa-ONNX
Sherpa-ONNX provides C/C++ API suitable for embedded:
For server-side processing (when ESP32 sends audio to a server):
pip install funasr vllm funasr-server --device cuda # OpenAI-compatible at :8000References