A Fastify REST API that converts PDF and EPUB files to natural-sounding audio using AI TTS. Supports OpenAI TTS, ElevenLabs, and Google Cloud TTS — switch providers by changing a single env var.
- 📄 PDF text extraction (via
pdfjs-dist) - 📚 EPUB text extraction with chapter awareness (via
epub2) - 🎤 Multi-provider TTS — OpenAI, ElevenLabs, Google Cloud
- 🔀 Smart chunking — splits on sentence boundaries, never mid-word
- 🎵 Output formats — MP3, WAV, OGG (converted via ffmpeg)
- ⚡ Async jobs — upload → get job ID → poll progress → download
- 🔁 Concurrent synthesis — up to 3 parallel TTS requests per job
- Node.js ≥ 18
- ffmpeg installed and in your PATH
# macOS
brew install ffmpeg
# Ubuntu / Debian
sudo apt install ffmpeg
# Windows
winget install ffmpeg# 1. Install dependencies
npm install
# 2. Configure environment
cp .env.example .env
# Edit .env and set your TTS provider API keyTTS_PROVIDER=kokoro
KOKORO_BASE_URL=http://localhost:8880 # default, change if needed
KOKORO_VOICE=af_bellaTTS_PROVIDER=openai
OPENAI_API_KEY=sk-...# Development (hot reload)
npm run dev
# Production
npm run build
npm startServer starts at http://localhost:3000.
Upload a PDF or EPUB and start conversion.
Content-Type: multipart/form-data
| Field | Type | Description |
|---|---|---|
file |
File | .pdf or .epub file (required) |
format |
string | mp3 | wav | ogg (default: mp3) |
voice |
string | Provider-specific voice ID |
speed |
number | 0.25–4.0 (default: 1.0) |
provider |
string | Override: openai | elevenlabs | google |
Response 202:
{
"jobId": "abc-123",
"status": "queued",
"statusUrl": "/jobs/abc-123",
"downloadUrl": "/download/abc-123"
}Poll job progress.
Response:
{
"jobId": "abc-123",
"status": "converting",
"progress": 45,
"message": "Synthesizing chunk 3/7…",
"totalChunks": 7,
"completedChunks": 3
}status values: queued → extracting → converting → merging → done | error
Download the finished audio file (only available when status === "done").
Returns the audio file as a binary stream with the correct Content-Type.
List available voices for the active (or specified) provider.
GET /voices?provider=openai
Returns supported output formats and providers.
Health check.
# Convert a PDF to MP3
curl -X POST http://localhost:3000/convert \
-F "file=@my-book.pdf" \
-F "format=mp3" \
-F "voice=nova" \
-F "speed=1.1"
# Poll status
curl http://localhost:3000/jobs/abc-123
# Download when done
curl -o my-book.mp3 http://localhost:3000/download/abc-123- Runs entirely on your machine, no API key, no cost per character
- Quality rivals paid providers, especially
af_bellaandbm_george - Requires a running kokoro-fastapi server
Setup (pick one):
# Option A — Python (CPU)
pip install kokoro-fastapi
python -m kokoro_fastapi
# Option B — Docker CPU
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest
# Option C — Docker GPU (much faster for long books)
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latestAvailable voices:
| ID | Name | Accent |
|---|---|---|
af_bella |
Bella | American Female |
af_nicole |
Nicole | American Female |
af_sarah |
Sarah | American Female |
af_sky |
Sky | American Female |
am_adam |
Adam | American Male |
am_michael |
Michael | American Male |
bf_emma |
Emma | British Female |
bf_isabella |
Isabella | British Female |
bm_george |
George | British Male |
bm_lewis |
Lewis | British Male |
Check the server is reachable before converting:
curl http://localhost:3000/health/kokoro- Models:
tts-1(faster, cheaper) ortts-1-hd(higher quality) - Voices:
alloy,echo,fable,onyx,nova,shimmer - ~$0.015 per 1K characters (
tts-1-hd)
- Most natural, expressive voices
- Use
GET /voicesto list available voice IDs - Turbo v2 model used by default
- Requires a service account JSON key
- Set
GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json - Neural2 voices are highest quality
src/
├── server.ts # Fastify entry point
├── config.ts # Env-based config
├── pipeline.ts # Conversion orchestrator
├── types.ts # Shared TypeScript types
├── extractors/
│ ├── pdf.ts # pdfjs-dist extractor
│ └── epub.ts # epub2 extractor
├── providers/
│ ├── openai.ts # OpenAI TTS
│ ├── elevenlabs.ts # ElevenLabs TTS
│ ├── google.ts # Google Cloud TTS
│ └── factory.ts # Provider resolver
├── routes/
│ └── index.ts # All API routes
└── utils/
├── chunker.ts # Text chunking + cleaning
├── audioMerger.ts # ffmpeg concat + encode
└── jobStore.ts # In-memory job tracker
