Skip to content

shiftbloom-studio/voxcpm2-api

Repository files navigation

VoxCPM2 API

CI Release

Production-ready FastAPI and WebSocket service for VoxCPM2, with:

  • REST synthesis via /v1/speech
  • streaming synthesis via /v1/stream
  • optional transcription via /v1/transcribe
  • Linux CUDA auto-selection for nano-vllm-voxcpm
  • official voxcpm backend support for prompt and reference audio
  • macOS Apple Silicon compatibility patches for VoxCPM2 CPU inference
  • a matching Tauri desktop client released alongside the API

Release Artifacts

Every tagged release publishes three separate artifact families:

  • voxcpm2_api-<version>-py3-none-any.whl The API package with FastAPI, WebSocket endpoints, Docker support, and the built-in compatibility layer.
  • voxcpm2_compat-<version>-py3-none-any.whl The standalone compatibility wrapper for custom Python integrations that need the macOS and CPU safety patches without the API service.
  • VoxCPM2-ui-macos-<tag>-<arch>.zip The Tauri desktop application bundle for macOS.

Installation

From a GitHub release wheel

Install the API directly from a release asset:

pip install "voxcpm2-api[voxcpm] @ https://github.com/fabianzimber/voxcpm2-api/releases/download/v0.2.0/voxcpm2_api-0.2.0-py3-none-any.whl"

If you only want the compatibility wrapper for your own VoxCPM2 code:

pip install "voxcpm2-compat @ https://github.com/fabianzimber/voxcpm2-api/releases/download/v0.2.0/voxcpm2_compat-0.2.0-py3-none-any.whl"

From source

python3 -m venv .venv
. .venv/bin/activate
pip install -e ".[dev,voxcpm]"
cp .env.example .env
voxcpm2-api

The helper scripts do the same:

./scripts/bootstrap.sh
./scripts/run-dev.sh

Docker

The default Docker image now installs the official voxcpm backend automatically.

cp .env.example .env
docker compose up --build

Backend Strategy

The service keeps one API surface and swaps runtimes underneath it:

  • Linux + NVIDIA CUDA: prefers nano-vllm-voxcpm for plain text synthesis
  • macOS, Windows, generic Linux, or conditioned requests: uses the official voxcpm Python package
  • prompt or reference audio requests: always route to the official voxcpm backend

macOS / Apple Silicon

VoxCPM2 is currently safest on macOS when it runs on CPU:

  • MPS is disabled because VoxCPM2 uses bfloat16 and the public runtime is not stable on Apple MPS
  • the API patches PyTorch scaled_dot_product_attention for the exact CPU decoding path used by VoxCPM2
  • OpenMP duplicate-library crashes are suppressed for mixed native dependency stacks

These patches are built into voxcpm2-api and published separately as voxcpm2-compat.

Configuration

All runtime configuration is environment-driven.

Important variables:

  • VOXCPM2_MODEL_ID
  • VOXCPM2_MODEL_PATH
  • VOXCPM2_MODEL_CACHE_DIR
  • VOXCPM2_PREFER_BACKEND
  • VOXCPM2_LOAD_DENOISER
  • VOXCPM2_OPTIMIZE_MODEL
  • VOXCPM2_LOCAL_FILES_ONLY
  • VOXCPM2_STARTUP_LOAD_MODEL
  • VOXCPM2_HF_ENDPOINT
  • VOXCPM2_CORS_ORIGINS
  • VOXCPM2_NANOVLLM_DEVICES

See ./.env.example for the full template.

API

Health

curl http://localhost:8000/health
curl http://localhost:8000/api/status
curl http://localhost:8000/v1/runtime

Synthesis

Return WAV:

curl -X POST http://localhost:8000/v1/speech \
  -H 'Content-Type: application/json' \
  -d '{"text":"Hello from VoxCPM2","response_format":"wav"}' \
  --output out.wav

Return base64:

curl -X POST http://localhost:8000/v1/speech \
  -H 'Content-Type: application/json' \
  -d '{"text":"Hello from VoxCPM2","response_format":"base64"}'

Prompt continuation:

{
  "text": "Continue this in the same voice.",
  "prompt_text": "Earlier context",
  "prompt_audio_base64": "<wav-as-base64>",
  "reference_audio_base64": "<optional-reference-wav>",
  "response_format": "base64"
}

Streaming

Connect to /v1/stream, send one JSON request, then read:

  • session.started
  • repeated audio.chunk
  • audio.completed

Example payload:

{
  "text": "Stream this sentence.",
  "chunk_format": "pcm16"
}

Transcription

curl -X POST http://localhost:8000/v1/transcribe \
  -H 'Content-Type: application/json' \
  -d '{"audio_base64":"<wav-as-base64>"}'

Desktop Client

The repository also contains a Tauri desktop app in ./voxcpm2-ui. Tagged releases attach a macOS bundle ZIP so the UI can be downloaded without building Rust locally.

Local development:

cd voxcpm2-ui/src-tauri
cargo tauri dev

Testing and Release

. .venv/bin/activate
pytest
ruff check .
./scripts/build-release.sh

CI runs on GitHub Actions for Linux and macOS. Tagged releases automatically publish:

  • the API wheel and sdist
  • the standalone compatibility wheel and sdist
  • the macOS Tauri desktop bundle

License

AGPL-3.0-only. See ./LICENSE.

About

RESTful API & Streaming API to integrate VoxCPM2 with one-step-setup and cross-plattform optimization

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors