A high-performance, lightweight Text-to-Speech (TTS) server built with FastAPI, wrapping the onnx KittenTTS v0.8 model, which can run very efficiently on CPU with optional GPU acceleration.
This project provides a robust, production-ready interface for the ultra-lightweight KittenTTS model and engine. It features a modern Web UI, true GPU acceleration via ONNX Runtime, and full OpenAI API compatibility for easy integration into existing workflows.
Production-ready FastAPI wrapper around KittenTTS, focused on fast local/self-hosted deployment.
- Modern Web UI: Text input, voice controls, playback, and download.
- OpenAI-Compatible API: Includes
/v1/models,/v1/audio/speech, and/v1/audio/voices. - GPU Acceleration: Uses ONNX Runtime GPU providers when available.
- CPU Friendly: Lightweight model (~15M params, under 25MB).
- Long-Text Support: Optional chunking and merged output.
- Robust Text Preprocessing: Cleans noisy artifacts and normalizes tricky text forms for more stable synthesis.
- Env-Based Config: Configure runtime with
.envandKITTEN_*vars. - Browser UI State: UI preferences are stored in local browser storage.
- Built-in Voices: 8 voices included (Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, Leo).
- Piper Alternative: Compact self-hosted TTS with low overhead.
Fastest way to run on CPU with the published image:
docker run -it -d \
--name kittentts-fastapi \
--restart unless-stopped \
-e KITTEN_MODEL_REPO_ID="KittenML/kitten-tts-nano-0.8-fp32" \
-p 8005:8005 \
ghcr.io/richardr1126/kittentts-fastapi-cpuWorks well on Raspberry Pi (64-bit OS) as well.
Supported environment variables:
KITTEN_SERVER_HOST(default:0.0.0.0)KITTEN_SERVER_PORT(default:8005)KITTEN_SERVER_ENABLE_PERFORMANCE_MONITOR(default:false)KITTEN_MODEL_REPO_ID(default:KittenML/kitten-tts-nano-0.8-fp32)KITTEN_TTS_DEVICE(default:auto, options:auto,cpu,cuda)KITTEN_MODEL_CACHE(default:model_cache)KITTEN_GEN_DEFAULT_SPEED(default:1.1)KITTEN_GEN_DEFAULT_LANGUAGE(default:en)KITTEN_AUDIO_FORMAT(default:wav, options:wav,mp3,opus,aac)KITTEN_AUDIO_SAMPLE_RATE(default:24000)KITTEN_TEXT_PROFILE(default:balanced, options:balanced,narration,dialogue)KITTEN_TEXT_PROFILES_JSON(optional JSON object to override/extend profile defaults)KITTEN_UI_TITLE(default:Kitten TTS Server)KITTEN_UI_SHOW_LANGUAGE_SELECT(default:true)
- Python: 3.13+
- uv: Install uv
- Audio runtime libs:
libsndfileandffmpegavailable on system path.
# Clone the repository
git clone https://github.com/richardr1126/KittenTTS-FastAPI.git
cd KittenTTS-FastAPI
# Create local environment config
cp .env.example .env
# Sync dependencies and create virtual environment (CPU/default)
uv sync
# Run the server
uv run src/server.pyFor NVIDIA GPU local installs, sync the dedicated dependency group:
uv sync --group nvidiaThen set KITTEN_TTS_DEVICE=cuda in .env (or export it in your shell) before starting the server.
After startup, the server logs the exact UI URL to visit (typically http://localhost:8005/).
The fastest way to deploy is using Docker Compose.
Create .env first (cp .env.example .env), then run:
docker compose up -d --buildMake sure you have the NVIDIA Container Toolkit installed.
docker compose -f docker-compose-gpu.yml up -d --buildcurl http://localhost:8005/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello from the Kitten TTS FastAPI server!",
"voice": "Jasper",
"speed": 1.1,
"response_format": "mp3"
}' \
--output speech.mp3model accepts canonical tts-1 and also supports aliases KittenTTS and kitten-tts.
curl http://localhost:8005/v1/modelscurl http://localhost:8005/v1/audio/voicescurl http://localhost:8005/tts \
-H "Content-Type: application/json" \
-d '{
"text": "Alice: Hi there.\nBob: Hey, ready to start?",
"voice": "Jasper",
"output_format": "mp3",
"split_text": true,
"chunk_size": 120,
"speed": 1.0,
"text_options": {
"profile": "dialogue"
}
}' \
--output speech.mp3/tts supports request-level text_options overrides for:
profile, remove_punctuation, normalize_pause_punctuation, pause_strength, dialogue_turn_splitting, speaker_label_mode, max_punct_run.
All other preprocessing flags are profile-defined server defaults.
Visit http://localhost:8005/docs for the full Swagger UI.
Server settings are loaded from environment variables (.env for local/dev).
Copy .env.example to .env and edit values as needed, then restart the server.
KITTEN_TTS_DEVICE:auto,cuda, orcpu.KITTEN_AUDIO_FORMAT:wav,mp3,opus, oraac.KITTEN_MODEL_REPO_ID: Hugging Face model repo.KITTEN_MODEL_CACHE: Model cache directory path.KITTEN_TEXT_PROFILE: Active text profile (balanced,narration,dialogue).KITTEN_TEXT_PROFILES_JSON: Optional JSON object merged onto defaulttext_processing.profilesat startup (example:{"balanced":{"pause_strength":"strong"},"dialogue":{"dialogue_turn_splitting":true}}).- Profile defaults: Full preprocessing defaults (cleanup + normalization pipeline flags) are defined in
text_processing.profilesinsrc/config.py. - Override model: Selected profile provides the baseline.
/ttscan override only this focused subset viatext_options:remove_punctuation,normalize_pause_punctuation,pause_strength,dialogue_turn_splitting,speaker_label_mode,max_punct_run. - Text preprocessing is enabled by default and includes cleanup/normalization steps (for example, mixed alphanumeric tokens like
gpt4are normalized to improve phonemization stability). - OpenAI route compatibility:
/v1/audio/speechkeeps strict OpenAI request fields; advanced text options are configured server-side (profile/defaults). - UI state (last text, voice, theme) is stored in browser
localStorage, not on the API server.
- Phonemizer Errors: The app uses bundled
espeakng_loader; if your platform blocks dynamic libraries, install systemespeak-ng. - GPU Not Used: Ensure
torch.cuda.is_available()isTrue. The container/host must have NVIDIA drivers and the Container Toolkit. - Audio Errors on Linux: Ensure
libsndfile1is installed (sudo apt install libsndfile1).
This project is licensed under the Apache License 2.0. See the LICENSE file for details. This license choice aligns with the upstream KittenTTS project/model licensing used by this repository.