🎙️ pdf-epub-to-audio

A Fastify REST API that converts PDF and EPUB files to natural-sounding audio using AI TTS. Supports OpenAI TTS, ElevenLabs, and Google Cloud TTS — switch providers by changing a single env var.

Features

📄 PDF text extraction (via pdfjs-dist)
📚 EPUB text extraction with chapter awareness (via epub2)
🎤 Multi-provider TTS — OpenAI, ElevenLabs, Google Cloud
🔀 Smart chunking — splits on sentence boundaries, never mid-word
🎵 Output formats — MP3, WAV, OGG (converted via ffmpeg)
⚡ Async jobs — upload → get job ID → poll progress → download
🔁 Concurrent synthesis — up to 3 parallel TTS requests per job

Prerequisites

Node.js ≥ 18
ffmpeg installed and in your PATH

# macOS
brew install ffmpeg

# Ubuntu / Debian
sudo apt install ffmpeg

# Windows
winget install ffmpeg

Setup

# 1. Install dependencies
npm install

# 2. Configure environment
cp .env.example .env
# Edit .env and set your TTS provider API key

Minimum `.env` for Kokoro (free, local — recommended)

TTS_PROVIDER=kokoro
KOKORO_BASE_URL=http://localhost:8880   # default, change if needed
KOKORO_VOICE=af_bella

Minimum `.env` for OpenAI

TTS_PROVIDER=openai
OPENAI_API_KEY=sk-...

Running

# Development (hot reload)
npm run dev

# Production
npm run build
npm start

Server starts at http://localhost:3000.

API Reference

`POST /convert`

Upload a PDF or EPUB and start conversion.

Content-Type: multipart/form-data

Field	Type	Description
`file`	File	`.pdf` or `.epub` file (required)
`format`	string	`mp3` \| `wav` \| `ogg` (default: `mp3`)
`voice`	string	Provider-specific voice ID
`speed`	number	0.25–4.0 (default: 1.0)
`provider`	string	Override: `openai` \| `elevenlabs` \| `google`

Response 202:

{
  "jobId": "abc-123",
  "status": "queued",
  "statusUrl": "/jobs/abc-123",
  "downloadUrl": "/download/abc-123"
}

`GET /jobs/:jobId`

Poll job progress.

Response:

{
  "jobId": "abc-123",
  "status": "converting",
  "progress": 45,
  "message": "Synthesizing chunk 3/7…",
  "totalChunks": 7,
  "completedChunks": 3
}

status values: queued → extracting → converting → merging → done | error

`GET /download/:jobId`

Download the finished audio file (only available when status === "done").

Returns the audio file as a binary stream with the correct Content-Type.

`GET /voices`

List available voices for the active (or specified) provider.

GET /voices?provider=openai

`GET /formats`

Returns supported output formats and providers.

`GET /health`

Health check.

Example: cURL

# Convert a PDF to MP3
curl -X POST http://localhost:3000/convert \
  -F "file=@my-book.pdf" \
  -F "format=mp3" \
  -F "voice=nova" \
  -F "speed=1.1"

# Poll status
curl http://localhost:3000/jobs/abc-123

# Download when done
curl -o my-book.mp3 http://localhost:3000/download/abc-123

Provider Notes

Kokoro (local, free — recommended)

Runs entirely on your machine, no API key, no cost per character
Quality rivals paid providers, especially af_bella and bm_george
Requires a running kokoro-fastapi server

Setup (pick one):

# Option A — Python (CPU)
pip install kokoro-fastapi
python -m kokoro_fastapi

# Option B — Docker CPU
docker run -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-cpu:latest

# Option C — Docker GPU (much faster for long books)
docker run --gpus all -p 8880:8880 ghcr.io/remsky/kokoro-fastapi-gpu:latest

Available voices:

ID	Name	Accent
`af_bella`	Bella	American Female
`af_nicole`	Nicole	American Female
`af_sarah`	Sarah	American Female
`af_sky`	Sky	American Female
`am_adam`	Adam	American Male
`am_michael`	Michael	American Male
`bf_emma`	Emma	British Female
`bf_isabella`	Isabella	British Female
`bm_george`	George	British Male
`bm_lewis`	Lewis	British Male

Check the server is reachable before converting:

curl http://localhost:3000/health/kokoro

OpenAI TTS

Models: tts-1 (faster, cheaper) or tts-1-hd (higher quality)
Voices: alloy, echo, fable, onyx, nova, shimmer
~$0.015 per 1K characters (tts-1-hd)

ElevenLabs

Most natural, expressive voices
Use GET /voices to list available voice IDs
Turbo v2 model used by default

Google Cloud TTS

Requires a service account JSON key
Set GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json
Neural2 voices are highest quality

Project Structure

src/
├── server.ts          # Fastify entry point
├── config.ts          # Env-based config
├── pipeline.ts        # Conversion orchestrator
├── types.ts           # Shared TypeScript types
├── extractors/
│   ├── pdf.ts         # pdfjs-dist extractor
│   └── epub.ts        # epub2 extractor
├── providers/
│   ├── openai.ts      # OpenAI TTS
│   ├── elevenlabs.ts  # ElevenLabs TTS
│   ├── google.ts      # Google Cloud TTS
│   └── factory.ts     # Provider resolver
├── routes/
│   └── index.ts       # All API routes
└── utils/
    ├── chunker.ts     # Text chunking + cleaning
    ├── audioMerger.ts # ffmpeg concat + encode
    └── jobStore.ts    # In-memory job tracker

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
public		public
src		src
.DS_Store		.DS_Store
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package.json		package.json
screenshot.png		screenshot.png
tsconfig.json		tsconfig.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ pdf-epub-to-audio

Features

Prerequisites

Setup

Minimum `.env` for Kokoro (free, local — recommended)

Minimum `.env` for OpenAI

Running

API Reference

`POST /convert`

`GET /jobs/:jobId`

`GET /download/:jobId`

`GET /voices`

`GET /formats`

`GET /health`

Example: cURL

Provider Notes

Kokoro (local, free — recommended)

OpenAI TTS

ElevenLabs

Google Cloud TTS

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ pdf-epub-to-audio

Features

Prerequisites

Setup

Minimum .env for Kokoro (free, local — recommended)

Minimum .env for OpenAI

Running

API Reference

POST /convert

GET /jobs/:jobId

GET /download/:jobId

GET /voices

GET /formats

GET /health

Example: cURL

Provider Notes

Kokoro (local, free — recommended)

OpenAI TTS

ElevenLabs

Google Cloud TTS

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Minimum `.env` for Kokoro (free, local — recommended)

Minimum `.env` for OpenAI

`POST /convert`

`GET /jobs/:jobId`

`GET /download/:jobId`

`GET /voices`

`GET /formats`

`GET /health`

Packages