Skip to content

junwatu/purrvo

Repository files navigation

KittenTTS API

A high-quality text-to-speech API built with FastAPI and KittenTTS. This API provides endpoints to convert text to speech using various voice options.

Features

  • 🎵 High-quality text-to-speech generation with KittenTTS (25MB model)
  • 🗣️ Multiple voice options (8 different voices: 4 male, 4 female)
  • 🚀 Fast API with automatic documentation
  • 📁 WAV audio file output (24kHz sample rate)
  • 🔍 Health check and voice listing endpoints
  • 📝 Request validation and error handling
  • ⚡ CPU-optimized (no GPU required)
  • 📦 Modern dependency management with uv

Quick Start

Prerequisites

This project uses uv for fast, reliable dependency management.

Install uv (if not already installed)

# On macOS and Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows
powershell -c "irm https://astral.sh/uv/install.sh | iex"

# Or with pip
pip install uv

Installation

  1. Clone or download this project
  2. Run the setup script:
python setup.py

Or manually install dependencies:

# Install project dependencies
uv sync

# Install KittenTTS (not available on PyPI)
uv pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl

Running the API

Development Mode (with auto-reload)

uv run python run_api.py --reload

Production Mode

uv run python run_api.py --host 0.0.0.0 --port 8000 --workers 4

Using uvicorn directly

uv run uvicorn api:app --host 0.0.0.0 --port 8000 --reload

The API will be available at http://localhost:8000

Testing the API

Simple test (similar to direct TTS usage)

uv run python test_simple.py

Comprehensive test suite

uv run python test_api.py

API Documentation

Once the server is running, you can access:

Available Endpoints

GET /

Root endpoint with API information

GET /health

Health check endpoint to verify the TTS model is loaded

GET /voices

Get list of available voices

Response:

{
  "available_voices": [
    "expr-voice-2-m", "expr-voice-2-f",
    "expr-voice-3-m", "expr-voice-3-f",
    "expr-voice-4-m", "expr-voice-4-f",
    "expr-voice-5-m", "expr-voice-5-f"
  ],
  "total_count": 8
}

POST /generate

Generate TTS audio from text (recommended)

Request Body:

{
  "text": "Hello, this is a test message",
  "voice": "expr-voice-2-f"
}

Response: WAV audio file

GET /generate

Generate TTS audio using GET request (for simple testing)

Parameters:

  • text (required): Text to convert to speech
  • voice (optional): Voice to use (default: "expr-voice-2-f")

Example:

GET /generate?text=Hello%20world&voice=expr-voice-2-m

Available Voices

  • expr-voice-2-m - Male voice 2
  • expr-voice-2-f - Female voice 2
  • expr-voice-3-m - Male voice 3
  • expr-voice-3-f - Female voice 3
  • expr-voice-4-m - Male voice 4
  • expr-voice-4-f - Female voice 4
  • expr-voice-5-m - Male voice 5
  • expr-voice-5-f - Female voice 5

Usage Examples

Using curl

Generate TTS with POST request:

curl -X POST "http://localhost:8000/generate" \
     -H "Content-Type: application/json" \
     -d '{"text": "Hello, this is a test message", "voice": "expr-voice-2-f"}' \
     --output output.wav

Generate TTS with GET request:

curl "http://localhost:8000/generate?text=Hello%20world&voice=expr-voice-2-m" \
     --output output.wav

Get available voices:

curl "http://localhost:8000/voices"

Using Python requests

import requests

# Generate TTS
response = requests.post(
    "http://localhost:8000/generate",
    json={
        "text": "Hello, this is a test message",
        "voice": "expr-voice-2-f"
    }
)

if response.status_code == 200:
    with open("output.wav", "wb") as f:
        f.write(response.content)
    print("Audio saved to output.wav")
else:
    print(f"Error: {response.status_code}")

Using JavaScript/fetch

// Generate TTS
fetch('http://localhost:8000/generate', {
    method: 'POST',
    headers: {
        'Content-Type': 'application/json',
    },
    body: JSON.stringify({
        text: 'Hello, this is a test message',
        voice: 'expr-voice-2-f'
    })
})
.then(response => response.blob())
.then(blob => {
    const url = window.URL.createObjectURL(blob);
    const a = document.createElement('a');
    a.href = url;
    a.download = 'output.wav';
    a.click();
});

Development

Project Structure

kitten-tts/
├── api.py              # Main FastAPI application
├── run_api.py          # Server startup script
├── tts.py              # Original TTS script (direct usage)
├── test_simple.py      # Simple API test script
├── test_api.py         # Comprehensive API test suite
├── setup.py            # Project setup script
├── pyproject.toml      # Project configuration and dependencies
├── uv.lock             # Locked dependency versions
├── requirements.txt    # Legacy pip requirements (for reference)
└── README.md           # This file

Development Setup

# Install development dependencies
uv sync --group dev

# Run tests
uv run pytest

# Format code
uv run black .

# Lint code
uv run ruff check .

# Type checking
uv run mypy .

Adding New Features

The API is built with FastAPI, making it easy to extend:

  1. Add new endpoints in api.py
  2. Update dependencies in pyproject.toml
  3. Run uv sync to install new dependencies
  4. Test your changes with uv run python run_api.py --reload

Configuration

Command Line Options

uv run python run_api.py --help

Options:

  • --host: Host to bind to (default: 0.0.0.0)
  • --port: Port to bind to (default: 8000)
  • --workers: Number of worker processes (default: 1)
  • --reload: Enable auto-reload for development
  • --log-level: Log level (debug, info, warning, error, critical)

Environment Variables

You can also configure the API using environment variables:

  • TTS_HOST: Host to bind to
  • TTS_PORT: Port to bind to
  • TTS_WORKERS: Number of worker processes

Dependencies

Core Dependencies

  • FastAPI: Modern, fast web framework for building APIs
  • Uvicorn: ASGI server implementation
  • KittenTTS: High-quality, lightweight TTS model (25MB)
  • SoundFile: Audio file I/O
  • NumPy: Numerical computing
  • Pydantic: Data validation

Development Dependencies

  • pytest: Testing framework
  • black: Code formatter
  • ruff: Fast Python linter
  • mypy: Static type checker

Limitations

  • Maximum text length: 1000 characters
  • Output format: WAV (24kHz sample rate)
  • Languages: English only (multilingual support planned)
  • The TTS model loads on startup and requires some memory (~100MB)

Performance

  • Model size: 25MB (ultra-lightweight)
  • CPU only: No GPU required
  • Inference speed: Real-time generation
  • Memory usage: ~100MB for model + runtime
  • Throughput: Multiple concurrent requests supported

Error Handling

The API returns appropriate HTTP status codes:

  • 200: Success
  • 400: Bad request (invalid input)
  • 500: Internal server error
  • 503: Service unavailable (model not loaded)

Error responses include detailed messages:

{
  "detail": "Text cannot be empty"
}

Deployment

Docker (recommended for production)

Create a Dockerfile:

FROM python:3.13-slim

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /bin/uv

WORKDIR /app

# Copy project files
COPY pyproject.toml uv.lock ./
COPY . .

# Install dependencies
RUN uv sync --frozen

# Install KittenTTS
RUN uv pip install https://github.com/KittenML/KittenTTS/releases/download/0.1/kittentts-0.1.0-py3-none-any.whl

EXPOSE 8000

CMD ["uv", "run", "python", "run_api.py", "--host", "0.0.0.0", "--port", "8000"]

Build and run:

docker build -t kitten-tts-api .
docker run -p 8000:8000 kitten-tts-api

Systemd Service

Create a systemd service file for production deployment:

[Unit]
Description=KittenTTS API
After=network.target

[Service]
Type=simple
User=your-user
WorkingDirectory=/path/to/kitten-tts
ExecStart=/path/to/uv run python run_api.py
Restart=always

[Install]
WantedBy=multi-user.target

About KittenTTS

KittenTTS is a state-of-the-art, ultra-lightweight TTS model developed by KittenML:

  • Size: Only 25MB (15M parameters)
  • Quality: High-quality, expressive voices
  • Performance: CPU-optimized, runs anywhere
  • Open Source: Free to use and modify

License

This project uses KittenTTS. Please check the KittenTTS license for usage terms.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests: uv run pytest
  5. Format code: uv run black .
  6. Submit a pull request

Support

For issues and questions:

  1. Check the interactive API documentation at /docs
  2. Review the error messages in the API responses
  3. Check the server logs for detailed error information
  4. Run the test scripts to verify functionality

Acknowledgments

  • KittenML for the amazing KittenTTS model
  • FastAPI for the excellent web framework
  • Astral for the fantastic uv package manager

About

Fast HTTP API for text-to-speech conversion using Kitten TTS

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors