A high-quality text-to-speech API built with FastAPI and KittenTTS. This API provides endpoints to convert text to speech using various voice options.
- 🎵 High-quality text-to-speech generation with KittenTTS (25MB model)
- 🗣️ Multiple voice options (8 different voices: 4 male, 4 female)
- 🚀 Fast API with automatic documentation
- 📁 WAV audio file output (24kHz sample rate)
- 🔍 Health check and voice listing endpoints
- 📝 Request validation and error handling
- ⚡ CPU-optimized (no GPU required)
- 📦 Modern dependency management with
uv
This project uses uv for fast, reliable dependency management.
# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows
powershell -c "irm https://astral.sh/uv/install.sh | iex"
# Or with pip
pip install uv- Clone or download this project
- Run the setup script:
python setup.pyOr manually install dependencies:
# Install project dependencies
uv sync
# Install KittenTTS (not available on PyPI)
uv pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whluv run python run_api.py --reloaduv run python run_api.py --host 0.0.0.0 --port 8000 --workers 4uv run uvicorn api:app --host 0.0.0.0 --port 8000 --reloadThe API will be available at http://localhost:8000
uv run python test_simple.pyuv run python test_api.pyOnce the server is running, you can access:
- Interactive API docs: http://localhost:8000/docs
- Alternative docs: http://localhost:8000/redoc
Root endpoint with API information
Health check endpoint to verify the TTS model is loaded
Get list of available voices
Response:
{
"available_voices": [
"expr-voice-2-m", "expr-voice-2-f",
"expr-voice-3-m", "expr-voice-3-f",
"expr-voice-4-m", "expr-voice-4-f",
"expr-voice-5-m", "expr-voice-5-f"
],
"total_count": 8
}Generate TTS audio from text (recommended)
Request Body:
{
"text": "Hello, this is a test message",
"voice": "expr-voice-2-f"
}Response: WAV audio file
Generate TTS audio using GET request (for simple testing)
Parameters:
text(required): Text to convert to speechvoice(optional): Voice to use (default: "expr-voice-2-f")
Example:
GET /generate?text=Hello%20world&voice=expr-voice-2-m
expr-voice-2-m- Male voice 2expr-voice-2-f- Female voice 2expr-voice-3-m- Male voice 3expr-voice-3-f- Female voice 3expr-voice-4-m- Male voice 4expr-voice-4-f- Female voice 4expr-voice-5-m- Male voice 5expr-voice-5-f- Female voice 5
curl -X POST "http://localhost:8000/generate" \
-H "Content-Type: application/json" \
-d '{"text": "Hello, this is a test message", "voice": "expr-voice-2-f"}' \
--output output.wavcurl "http://localhost:8000/generate?text=Hello%20world&voice=expr-voice-2-m" \
--output output.wavcurl "http://localhost:8000/voices"import requests
# Generate TTS
response = requests.post(
"http://localhost:8000/generate",
json={
"text": "Hello, this is a test message",
"voice": "expr-voice-2-f"
}
)
if response.status_code == 200:
with open("output.wav", "wb") as f:
f.write(response.content)
print("Audio saved to output.wav")
else:
print(f"Error: {response.status_code}")// Generate TTS
fetch('http://localhost:8000/generate', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
text: 'Hello, this is a test message',
voice: 'expr-voice-2-f'
})
})
.then(response => response.blob())
.then(blob => {
const url = window.URL.createObjectURL(blob);
const a = document.createElement('a');
a.href = url;
a.download = 'output.wav';
a.click();
});kitten-tts/
├── api.py # Main FastAPI application
├── run_api.py # Server startup script
├── tts.py # Original TTS script (direct usage)
├── test_simple.py # Simple API test script
├── test_api.py # Comprehensive API test suite
├── setup.py # Project setup script
├── pyproject.toml # Project configuration and dependencies
├── uv.lock # Locked dependency versions
├── requirements.txt # Legacy pip requirements (for reference)
└── README.md # This file
# Install development dependencies
uv sync --group dev
# Run tests
uv run pytest
# Format code
uv run black .
# Lint code
uv run ruff check .
# Type checking
uv run mypy .The API is built with FastAPI, making it easy to extend:
- Add new endpoints in
api.py - Update dependencies in
pyproject.toml - Run
uv syncto install new dependencies - Test your changes with
uv run python run_api.py --reload
uv run python run_api.py --helpOptions:
--host: Host to bind to (default: 0.0.0.0)--port: Port to bind to (default: 8000)--workers: Number of worker processes (default: 1)--reload: Enable auto-reload for development--log-level: Log level (debug, info, warning, error, critical)
You can also configure the API using environment variables:
TTS_HOST: Host to bind toTTS_PORT: Port to bind toTTS_WORKERS: Number of worker processes
- FastAPI: Modern, fast web framework for building APIs
- Uvicorn: ASGI server implementation
- KittenTTS: High-quality, lightweight TTS model (25MB)
- SoundFile: Audio file I/O
- NumPy: Numerical computing
- Pydantic: Data validation
- pytest: Testing framework
- black: Code formatter
- ruff: Fast Python linter
- mypy: Static type checker
- Maximum text length: 1000 characters
- Output format: WAV (24kHz sample rate)
- Languages: English only (multilingual support planned)
- The TTS model loads on startup and requires some memory (~100MB)
- Model size: 25MB (ultra-lightweight)
- CPU only: No GPU required
- Inference speed: Real-time generation
- Memory usage: ~100MB for model + runtime
- Throughput: Multiple concurrent requests supported
The API returns appropriate HTTP status codes:
200: Success400: Bad request (invalid input)500: Internal server error503: Service unavailable (model not loaded)
Error responses include detailed messages:
{
"detail": "Text cannot be empty"
}Create a Dockerfile:
FROM python:3.13-slim
# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv
WORKDIR /app
# Copy project files
COPY pyproject.toml uv.lock ./
COPY . .
# Install dependencies
RUN uv sync --frozen
# Install KittenTTS
RUN uv pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl
EXPOSE 8000
CMD ["uv", "run", "python", "run_api.py", "--host", "0.0.0.0", "--port", "8000"]Build and run:
docker build -t kitten-tts-api .
docker run -p 8000:8000 kitten-tts-apiCreate a systemd service file for production deployment:
[Unit]
Description=KittenTTS API
After=network.target
[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/kitten-tts
ExecStart=/path/to/uv run python run_api.py
Restart=always
[Install]
WantedBy=multi-user.targetKittenTTS is a state-of-the-art, ultra-lightweight TTS model developed by KittenML:
- Size: Only 25MB (15M parameters)
- Quality: High-quality, expressive voices
- Performance: CPU-optimized, runs anywhere
- Open Source: Free to use and modify
This project uses KittenTTS. Please check the KittenTTS license for usage terms.
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
uv run pytest - Format code:
uv run black . - Submit a pull request
For issues and questions:
- Check the interactive API documentation at
/docs - Review the error messages in the API responses
- Check the server logs for detailed error information
- Run the test scripts to verify functionality
- KittenML for the amazing KittenTTS model
- FastAPI for the excellent web framework
- Astral for the fantastic
uvpackage manager