Skip to content

Feature Request: Add SenseVoice for faster ESP32 voice recognition #31

@LauraGPT

Description

@LauraGPT

Feature Request

ElatoAI does impressive work with real-time voice AI on ESP32. Suggesting SenseVoice as an ASR backend option — it's particularly well-suited for low-latency voice applications.

Why SenseVoice for edge voice AI?

  • Non-autoregressive — constant, predictable latency (no sequential decoding)
  • 5x faster than Whisper — critical when every ms matters for natural conversation
  • 234M params — efficient for server-side processing of audio from ESP32
  • 50+ languages — single model handles multilingual input
  • Emotion detection — could enable emotion-aware responses from the AI

Server-side integration

For ESP32 → server architecture, SenseVoice can run server-side with an OpenAI-compatible API:

pip install funasr
funasr-server --device cuda  # /v1/audio/transcriptions endpoint

The low latency of SenseVoice (non-autoregressive) combined with efficient WebSocket streaming makes it ideal for the real-time voice interaction ElatoAI provides.

Edge deployment option

For future on-device use, SenseVoice is available via Sherpa-ONNX which supports embedded platforms including ESP32-S3 (though memory-constrained).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions