KittenTTS API

A high-quality text-to-speech API built with FastAPI and KittenTTS. This API provides endpoints to convert text to speech using various voice options.

Features

🎵 High-quality text-to-speech generation with KittenTTS (25MB model)
🗣️ Multiple voice options (8 different voices: 4 male, 4 female)
🚀 Fast API with automatic documentation
📁 WAV audio file output (24kHz sample rate)
🔍 Health check and voice listing endpoints
📝 Request validation and error handling
⚡ CPU-optimized (no GPU required)
📦 Modern dependency management with uv

Quick Start

Prerequisites

This project uses uv for fast, reliable dependency management.

Install uv (if not already installed)

# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows
powershell -c "irm https://astral.sh/uv/install.sh | iex"

# Or with pip
pip install uv

Installation

Clone or download this project
Run the setup script:

python setup.py

Or manually install dependencies:

# Install project dependencies
uv sync

# Install KittenTTS (not available on PyPI)
uv pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl

Running the API

Development Mode (with auto-reload)

uv run python run_api.py --reload

Production Mode

uv run python run_api.py --host 0.0.0.0 --port 8000 --workers 4

Using uvicorn directly

uv run uvicorn api:app --host 0.0.0.0 --port 8000 --reload

The API will be available at http://localhost:8000

Testing the API

Simple test (similar to direct TTS usage)

uv run python test_simple.py

Comprehensive test suite

uv run python test_api.py

API Documentation

Once the server is running, you can access:

Interactive API docs: http://localhost:8000/docs
Alternative docs: http://localhost:8000/redoc

Available Endpoints

`GET /`

Root endpoint with API information

`GET /health`

Health check endpoint to verify the TTS model is loaded

`GET /voices`

Get list of available voices

Response:

{
  "available_voices": [
    "expr-voice-2-m", "expr-voice-2-f",
    "expr-voice-3-m", "expr-voice-3-f",
    "expr-voice-4-m", "expr-voice-4-f",
    "expr-voice-5-m", "expr-voice-5-f"
  ],
  "total_count": 8
}

`POST /generate`

Generate TTS audio from text (recommended)

Request Body:

{
  "text": "Hello, this is a test message",
  "voice": "expr-voice-2-f"
}

Response: WAV audio file

`GET /generate`

Generate TTS audio using GET request (for simple testing)

Parameters:

text (required): Text to convert to speech
voice (optional): Voice to use (default: "expr-voice-2-f")

Example:

GET /generate?text=Hello%20world&voice=expr-voice-2-m

Available Voices

expr-voice-2-m - Male voice 2
expr-voice-2-f - Female voice 2
expr-voice-3-m - Male voice 3
expr-voice-3-f - Female voice 3
expr-voice-4-m - Male voice 4
expr-voice-4-f - Female voice 4
expr-voice-5-m - Male voice 5
expr-voice-5-f - Female voice 5

Usage Examples

Using curl

Generate TTS with POST request:

curl -X POST "http://localhost:8000/generate" \
     -H "Content-Type: application/json" \
     -d '{"text": "Hello, this is a test message", "voice": "expr-voice-2-f"}' \
     --output output.wav

Generate TTS with GET request:

curl "http://localhost:8000/generate?text=Hello%20world&voice=expr-voice-2-m" \
     --output output.wav

Get available voices:

curl "http://localhost:8000/voices"

Using Python requests

import requests

# Generate TTS
response = requests.post(
    "http://localhost:8000/generate",
    json={
        "text": "Hello, this is a test message",
        "voice": "expr-voice-2-f"
    }
)

if response.status_code == 200:
    with open("output.wav", "wb") as f:
        f.write(response.content)
    print("Audio saved to output.wav")
else:
    print(f"Error: {response.status_code}")

Using JavaScript/fetch

// Generate TTS
fetch('http://localhost:8000/generate', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
    },
    body: JSON.stringify({
        text: 'Hello, this is a test message',
        voice: 'expr-voice-2-f'
    })
})
.then(response => response.blob())
.then(blob => {
    const url = window.URL.createObjectURL(blob);
    const a = document.createElement('a');
    a.href = url;
    a.download = 'output.wav';
    a.click();
});

Development

Project Structure

kitten-tts/
├── api.py              # Main FastAPI application
├── run_api.py          # Server startup script
├── tts.py              # Original TTS script (direct usage)
├── test_simple.py      # Simple API test script
├── test_api.py         # Comprehensive API test suite
├── setup.py            # Project setup script
├── pyproject.toml      # Project configuration and dependencies
├── uv.lock             # Locked dependency versions
├── requirements.txt    # Legacy pip requirements (for reference)
└── README.md           # This file

Development Setup

# Install development dependencies
uv sync --group dev

# Run tests
uv run pytest

# Format code
uv run black .

# Lint code
uv run ruff check .

# Type checking
uv run mypy .

Adding New Features

The API is built with FastAPI, making it easy to extend:

Add new endpoints in api.py
Update dependencies in pyproject.toml
Run uv sync to install new dependencies
Test your changes with uv run python run_api.py --reload

Configuration

Command Line Options

uv run python run_api.py --help

Options:

--host: Host to bind to (default: 0.0.0.0)
--port: Port to bind to (default: 8000)
--workers: Number of worker processes (default: 1)
--reload: Enable auto-reload for development
--log-level: Log level (debug, info, warning, error, critical)

Environment Variables

You can also configure the API using environment variables:

TTS_HOST: Host to bind to
TTS_PORT: Port to bind to
TTS_WORKERS: Number of worker processes

Dependencies

Core Dependencies

FastAPI: Modern, fast web framework for building APIs
Uvicorn: ASGI server implementation
KittenTTS: High-quality, lightweight TTS model (25MB)
SoundFile: Audio file I/O
NumPy: Numerical computing
Pydantic: Data validation

Development Dependencies

pytest: Testing framework
black: Code formatter
ruff: Fast Python linter
mypy: Static type checker

Limitations

Maximum text length: 1000 characters
Output format: WAV (24kHz sample rate)
Languages: English only (multilingual support planned)
The TTS model loads on startup and requires some memory (~100MB)

Performance

Model size: 25MB (ultra-lightweight)
CPU only: No GPU required
Inference speed: Real-time generation
Memory usage: ~100MB for model + runtime
Throughput: Multiple concurrent requests supported

Error Handling

The API returns appropriate HTTP status codes:

200: Success
400: Bad request (invalid input)
500: Internal server error
503: Service unavailable (model not loaded)

Error responses include detailed messages:

{
  "detail": "Text cannot be empty"
}

Deployment

Docker (recommended for production)

Create a Dockerfile:

FROM python:3.13-slim

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

WORKDIR /app

# Copy project files
COPY pyproject.toml uv.lock ./
COPY . .

# Install dependencies
RUN uv sync --frozen

# Install KittenTTS
RUN uv pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl

EXPOSE 8000

CMD ["uv", "run", "python", "run_api.py", "--host", "0.0.0.0", "--port", "8000"]

Build and run:

docker build -t kitten-tts-api .
docker run -p 8000:8000 kitten-tts-api

Systemd Service

Create a systemd service file for production deployment:

[Unit]
Description=KittenTTS API
After=network.target

[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/kitten-tts
ExecStart=/path/to/uv run python run_api.py
Restart=always

[Install]
WantedBy=multi-user.target

About KittenTTS

KittenTTS is a state-of-the-art, ultra-lightweight TTS model developed by KittenML:

Size: Only 25MB (15M parameters)
Quality: High-quality, expressive voices
Performance: CPU-optimized, runs anywhere
Open Source: Free to use and modify

License

This project uses KittenTTS. Please check the KittenTTS license for usage terms.

Contributing

Fork the repository
Create a feature branch
Make your changes
Run tests: uv run pytest
Format code: uv run black .
Submit a pull request

Support

For issues and questions:

Check the interactive API documentation at /docs
Review the error messages in the API responses
Check the server logs for detailed error information
Run the test scripts to verify functionality

Acknowledgments

KittenML for the amazing KittenTTS model
FastAPI for the excellent web framework
Astral for the fantastic uv package manager

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
api.py		api.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_api.py		run_api.py
setup.py		setup.py
test_api.py		test_api.py
test_simple.py		test_simple.py
tts.py		tts.py
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

KittenTTS API

Features

Quick Start

Prerequisites

Install uv (if not already installed)

Installation

Running the API

Development Mode (with auto-reload)

Production Mode

Using uvicorn directly

Testing the API

Simple test (similar to direct TTS usage)

Comprehensive test suite

API Documentation

Available Endpoints

GET /

GET /health

GET /voices

POST /generate

GET /generate

Available Voices

Usage Examples

Using curl

Generate TTS with POST request:

Generate TTS with GET request:

Get available voices:

Using Python requests

Using JavaScript/fetch

Development

Project Structure

Development Setup

Adding New Features

Configuration

Command Line Options

Environment Variables

Dependencies

Core Dependencies

Development Dependencies

Limitations

Performance

Error Handling

Deployment

Docker (recommended for production)

Systemd Service

About KittenTTS

License

Contributing

Support

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /`

`GET /health`

`GET /voices`

`POST /generate`

`GET /generate`

Packages