Production-ready FastAPI and WebSocket service for VoxCPM2, with:
- REST synthesis via
/v1/speech - streaming synthesis via
/v1/stream - optional transcription via
/v1/transcribe - Linux CUDA auto-selection for
nano-vllm-voxcpm - official
voxcpmbackend support for prompt and reference audio - macOS Apple Silicon compatibility patches for VoxCPM2 CPU inference
- a matching Tauri desktop client released alongside the API
Every tagged release publishes three separate artifact families:
voxcpm2_api-<version>-py3-none-any.whlThe API package with FastAPI, WebSocket endpoints, Docker support, and the built-in compatibility layer.voxcpm2_compat-<version>-py3-none-any.whlThe standalone compatibility wrapper for custom Python integrations that need the macOS and CPU safety patches without the API service.VoxCPM2-ui-macos-<tag>-<arch>.zipThe Tauri desktop application bundle for macOS.
Install the API directly from a release asset:
pip install "voxcpm2-api[voxcpm] @ https://github.com/fabianzimber/voxcpm2-api/releases/download/v0.2.0/voxcpm2_api-0.2.0-py3-none-any.whl"If you only want the compatibility wrapper for your own VoxCPM2 code:
pip install "voxcpm2-compat @ https://github.com/fabianzimber/voxcpm2-api/releases/download/v0.2.0/voxcpm2_compat-0.2.0-py3-none-any.whl"python3 -m venv .venv
. .venv/bin/activate
pip install -e ".[dev,voxcpm]"
cp .env.example .env
voxcpm2-apiThe helper scripts do the same:
./scripts/bootstrap.sh
./scripts/run-dev.shThe default Docker image now installs the official voxcpm backend automatically.
cp .env.example .env
docker compose up --buildThe service keeps one API surface and swaps runtimes underneath it:
- Linux + NVIDIA CUDA: prefers
nano-vllm-voxcpmfor plain text synthesis - macOS, Windows, generic Linux, or conditioned requests: uses the official
voxcpmPython package - prompt or reference audio requests: always route to the official
voxcpmbackend
VoxCPM2 is currently safest on macOS when it runs on CPU:
- MPS is disabled because VoxCPM2 uses
bfloat16and the public runtime is not stable on Apple MPS - the API patches PyTorch
scaled_dot_product_attentionfor the exact CPU decoding path used by VoxCPM2 - OpenMP duplicate-library crashes are suppressed for mixed native dependency stacks
These patches are built into voxcpm2-api and published separately as voxcpm2-compat.
All runtime configuration is environment-driven.
Important variables:
VOXCPM2_MODEL_IDVOXCPM2_MODEL_PATHVOXCPM2_MODEL_CACHE_DIRVOXCPM2_PREFER_BACKENDVOXCPM2_LOAD_DENOISERVOXCPM2_OPTIMIZE_MODELVOXCPM2_LOCAL_FILES_ONLYVOXCPM2_STARTUP_LOAD_MODELVOXCPM2_HF_ENDPOINTVOXCPM2_CORS_ORIGINSVOXCPM2_NANOVLLM_DEVICES
See ./.env.example for the full template.
curl http://localhost:8000/health
curl http://localhost:8000/api/status
curl http://localhost:8000/v1/runtimeReturn WAV:
curl -X POST http://localhost:8000/v1/speech \
-H 'Content-Type: application/json' \
-d '{"text":"Hello from VoxCPM2","response_format":"wav"}' \
--output out.wavReturn base64:
curl -X POST http://localhost:8000/v1/speech \
-H 'Content-Type: application/json' \
-d '{"text":"Hello from VoxCPM2","response_format":"base64"}'Prompt continuation:
{
"text": "Continue this in the same voice.",
"prompt_text": "Earlier context",
"prompt_audio_base64": "<wav-as-base64>",
"reference_audio_base64": "<optional-reference-wav>",
"response_format": "base64"
}Connect to /v1/stream, send one JSON request, then read:
session.started- repeated
audio.chunk audio.completed
Example payload:
{
"text": "Stream this sentence.",
"chunk_format": "pcm16"
}curl -X POST http://localhost:8000/v1/transcribe \
-H 'Content-Type: application/json' \
-d '{"audio_base64":"<wav-as-base64>"}'The repository also contains a Tauri desktop app in ./voxcpm2-ui. Tagged releases attach a macOS bundle ZIP so the UI can be downloaded without building Rust locally.
Local development:
cd voxcpm2-ui/src-tauri
cargo tauri dev. .venv/bin/activate
pytest
ruff check .
./scripts/build-release.shCI runs on GitHub Actions for Linux and macOS. Tagged releases automatically publish:
- the API wheel and sdist
- the standalone compatibility wheel and sdist
- the macOS Tauri desktop bundle
AGPL-3.0-only. See ./LICENSE.