API Reference

Base URL: http://127.0.0.1:8000

Kokoro WebUI provides two API groups:

Native API - Full-featured endpoints under /api/* and ws://.../ws/*
OpenAI-compatible API - Drop-in replacement for OpenAI's /v1/audio/speech

Quick Reference

Endpoint	Method	Description
`/api/health`	GET	Check server status and queue
`/api/capabilities`	GET	List voices, formats, and limits
`/api/speak`	POST	Generate single audio file
`/api/speak-stream`	POST	Stream audio as NDJSON
`/ws/speak-stream`	WS	Stream audio over WebSocket
`/v1/audio/speech`	POST	OpenAI-compatible endpoint

For practical examples, see examples/README.md.

Authentication

Authentication is optional. When KOKORO_REQUIRE_AUTH=1 is set:

HTTP routes accept Authorization: Bearer <key> or X-API-Key: <key>
WebSocket accepts the bearer header during handshake
The built-in Web UI uses short-lived session tokens instead of the API key
Query string API keys are not supported (security)

Failed auth attempts are rate-limited per client. After too many failures, you'll get 429 Too Many Requests with a Retry-After header.

See CONFIGURATION.md for details on setting up authentication.

Native Endpoints

Common Fields

These fields work across most native synthesis endpoints:

Field	Type	Required	Notes
`text`	string	yes	1 to 2500 characters
`voice`	string	no	Default: `af_heart`
`speed`	number	no	0.5 to 1.8, default 1.0
`pitch`	number	no	-6.0 to 6.0 semitones, default 0.0
`format`	string	no	`pcm`, `wav`, or `opus` (server-dependent)
`opus_bitrate`	string	no	`16k`, `24k`, `32k`, or `48k`
`wav_sample_rate`	string	no	`native`, `16000`, `22050`, `24000`, `44100`, `48000`

Notes:

pitch: 0.0 skips pitch processing entirely (faster)
Pitch shifting requires ffmpeg with rubberband filter
Check /api/capabilities to see which formats your server supports

GET /api/health

Returns server status, queue information, and runtime health.

Example response:

{
  "ok": true,
  "missing": [],
  "active_provider": "CPUExecutionProvider",
  "gpu": {
    "available": false,
    "process_vram_used_mb": 0
  },
  "queue": {
    "worker_limit": 2,
    "queue_limit": 8,
    "active_jobs": 0,
    "queued_jobs": 0,
    "available_slots": 10
  }
}

Key fields:

ok - true when everything is working
missing - List of missing dependencies or assets
active_provider - Current runtime (CPU, CUDA, etc.)
queue - Current load and capacity

GET /api/capabilities

Returns detailed information about available features.

Example response:

{
  "voices": ["af_heart", "af_sarah", "am_michael"],
  "formats": ["wav", "opus", "pcm"],
  "pitch_shifting": true,
  "synthesis_workers": 2,
  "websocket_streaming": true
}

Use this to populate UI controls or validate requests.

POST /api/speak

Generate a single audio file from text.

Request:

{
  "text": "Hello world",
  "voice": "af_heart",
  "speed": 1.0,
  "pitch": 0.0,
  "format": "wav"
}

Response:

Returns raw audio bytes:

audio/pcm for format: pcm
audio/wav for format: wav
audio/ogg for format: opus

Response headers include metadata:

Header	Description
`X-Audio-Format`	Format used
`X-Sample-Rate`	Sample rate in Hz
`X-Audio-Duration`	Duration in seconds

Errors:

400 - Invalid request (bad voice, format not supported, etc.)
503 - Server overloaded (queue full)

POST /api/speak-stream

Stream audio chunks as they're generated using NDJSON (Newline Delimited JSON).

Request:

Same as /api/speak, plus optional:

Field	Type	Default	Description
`target_chunk_chars`	integer	360	Target characters per chunk (80-2000)

Response:

Content-Type: application/x-ndjson

Each line is a JSON object with a type field:

Meta message (first):

{
  "type": "meta",
  "total_chunks": 3,
  "format": "wav"
}

Chunk message (one per audio chunk):

{
  "type": "chunk",
  "chunk_index": 0,
  "total_chunks": 3,
  "text": "First sentence.",
  "duration_sec": 1.28,
  "audio_base64": "..."
}

Error message (if a chunk fails):

{
  "type": "error",
  "detail": "Pitch shifting failed",
  "chunk_index": 1
}

Done message (last):

{
  "type": "done",
  "total_chunks": 3
}

WS /ws/speak-stream

WebSocket endpoint for real-time streaming.

Usage:

Connect to ws://127.0.0.1:8000/ws/speak-stream
Send a JSON message with synthesis parameters
Receive messages in the same format as NDJSON streaming

Example client payload:

{
  "text": "WebSocket streaming example",
  "voice": "af_heart",
  "format": "wav",
  "target_chunk_chars": 360
}

WebSocket is ideal for real-time applications where you want to start playing audio before the full text is synthesized.

OpenAI-Compatible Endpoints

Kokoro WebUI implements the OpenAI audio speech API, making it a drop-in replacement for applications that use OpenAI's TTS.

Base URL: http://127.0.0.1:8000/v1

POST /v1/audio/speech

OpenAI-compatible speech generation.

Request:

{
  "model": "kokoro",
  "input": "Hello from the compatible endpoint",
  "voice": "af_heart",
  "response_format": "wav",
  "speed": 1.0
}

Fields:

Field	Type	Required	Notes
`model`	string	yes	Ignored (accepted for compatibility)
`input`	string	yes	Text to synthesize (1-4096 chars)
`voice`	string	yes	Voice ID, or with pitch suffix like `af_heart+2.0`
`response_format`	string	no	`pcm`, `wav`, or `opus`
`speed`	number	no	0.25-4.0 (but Kokoro only supports 0.5-1.8)

Pitch with OpenAI API:

The OpenAI API doesn't have a pitch field, so we use voice suffixes:

af_heart - Normal pitch (0.0)
af_heart+2.0 - Raise pitch 2 semitones
af_heart-1.5 - Lower pitch 1.5 semitones

Suffix must be between -6.0 and +6.0.

Response:

Returns raw audio bytes:

wav and pcm stream progressively (chunked transfer encoding)
opus returns after full render

Headers:

Header	Description
`X-OpenAI-Compatible`	Always `kokoro`
`X-Audio-Format`	Format used

Errors:

Returns OpenAI-compatible error format:

{
  "error": {
    "message": "speed must be between 0.5 and 1.8 for Kokoro",
    "type": "invalid_request_error"
  }
}

GET /v1/models

Returns available models (just kokoro).

Response:

{
  "object": "list",
  "data": [
    {
      "id": "kokoro",
      "object": "model",
      "owned_by": "kokoro-webui"
    }
  ]
}

GET /v1/models/{model_id}

Returns metadata for a specific model.

Response:

{
  "id": "kokoro",
  "object": "model",
  "owned_by": "kokoro-webui"
}

Error Handling

All endpoints return consistent error formats:

Native API errors:

{
  "detail": "Voice 'invalid_voice' not found"
}

OpenAI-compatible errors:

{
  "error": {
    "message": "Voice 'invalid_voice' not found",
    "type": "invalid_request_error",
    "param": null,
    "code": null
  }
}

Common status codes:

Code	Meaning
400	Bad request (invalid parameters)
401	Unauthorized (auth required but missing)
429	Too many requests (rate limited)
503	Service unavailable (queue full)

Using with OpenAI Clients

Official Python client

from openai import OpenAI

client = OpenAI(
    base_url="http://127.0.0.1:8000/v1",
    api_key="not-needed"  # Or your API key if auth is enabled
)

response = client.audio.speech.create(
    model="kokoro",
    input="Hello world",
    voice="af_heart",
    response_format="wav"
)

response.stream_to_file("output.wav")

LangChain

from langchain_community.tools import OpenAITTS

tts = OpenAITTS(
    base_url="http://127.0.0.1:8000/v1",
    api_key="not-needed"
)

audio = tts.invoke({
    "input": "Hello from LangChain",
    "voice": "af_heart"
})

More Examples

For complete examples in multiple languages (curl, Python, JavaScript), see examples/README.md.

Differences from OpenAI

While we aim for compatibility, there are some differences:

No SSE streaming - stream_format: "sse" returns an error
Pitch via suffix - Use voice+2.0 instead of a separate pitch field
Speed limits - Kokoro only supports 0.5x to 1.8x speed
Model ignored - The model field is accepted but ignored
Native features missing - No access to lang, opus_bitrate, or target_chunk_chars via OpenAI endpoints

For full access to all features, use the native /api/* endpoints.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Reference

Quick Reference

Authentication

Native Endpoints

Common Fields

GET /api/health

GET /api/capabilities

POST /api/speak

POST /api/speak-stream

WS /ws/speak-stream

OpenAI-Compatible Endpoints

POST /v1/audio/speech

GET /v1/models

GET /v1/models/{model_id}

Error Handling

Using with OpenAI Clients

Official Python client

LangChain

More Examples

Differences from OpenAI

FilesExpand file tree

API.md

Latest commit

History

API.md

File metadata and controls

API Reference

Quick Reference

Authentication

Native Endpoints

Common Fields

GET /api/health

GET /api/capabilities

POST /api/speak

POST /api/speak-stream

WS /ws/speak-stream

OpenAI-Compatible Endpoints

POST /v1/audio/speech

GET /v1/models

GET /v1/models/{model_id}

Error Handling

Using with OpenAI Clients

Official Python client

LangChain

More Examples

Differences from OpenAI