Give your AI agent a voice.
Self-hosted TTS proxy and voice manager. Audition voices against your agent's actual dialogue, pick one, and pipe responses through it from the browser or CLI.
Voice starts while your agent is still working.
clarion-watchspeaks each assistant message the moment it's written — even before tool calls finish.
- Live session voice.
clarion-watchspeaks each assistant message as soon as it is written — including text before tool use. Voice starts while Claude is still working. - Audition voices. Paste your agent's characteristic dialogue. Hear each voice read it. Pick the one that fits.
- Save agent profiles. One agent uses Kokoro
bm_georgeat 1.0x, another uses Edgeen-GB-SoniaNeural. Both saved, both exportable as JSON. - Six TTS backends. Edge TTS (zero config), Kokoro (self-hosted, natural), Chatterbox (self-hosted, ElevenLabs-quality), Piper (self-hosted, lightweight), ElevenLabs (paid), Google Chirp 3 HD (paid).
- Terminal integration. Pipe agent responses through their voice from the CLI. Works with Claude Code via
clarion-watch(live daemon) or the stop hook. - Multi-agent support. Running several agents at once? Concurrent responses queue automatically and speak in the order they finished — no overlapping audio.
Requires Node.js 18+ and npm. Start with UI only — it works out of the box. Add the server later if you want higher-quality voices.
npm install
npm run dev
# http://localhost:5173Run in a second terminal alongside the UI:
cd server && npm install && npm run dev
# http://localhost:8080Or with Docker (Kokoro included):
docker compose up
# Kokoro at :8880, Clarion server at :8080Most TTS tools give you an API. Clarion gives you a workflow.
- Audition, don't guess — hear each voice read your agent's actual dialogue before you commit
- Self-hosted — your audio, your servers, your data
- proseOnly — strips code blocks, markdown, and structure so agents speak naturally, not robotically
- Live voice —
clarion-watchspeaks during the session, not after - Six backends — swap from free (Edge) to premium (ElevenLabs) without changing a line of agent code
- Multi-agent crews — each agent gets its own voice, concurrent responses queue automatically
Six layers: UI (React SPA), Services (TTS client, HMAC crypto, localStorage), Server (Hono on Cloudflare Workers or Node), Domain (agent model, voice lists), CLI (watch, speak, stream, doctor), and Design System (CSS tokens). The server proxies to six TTS backends — Edge TTS is always available as a zero-config fallback.
- Open the Audition tab
- Paste your agent's characteristic dialogue
- Select a backend (Kokoro for the most natural voices)
- Click play next to a voice to hear it read your text
- Click Use this voice, name the agent, done
Short, characteristic sentences work best. Paste what your agent would actually say, not generic test text.
# Install globally (run once from the Clarion directory)
npm install -g .
# Set up your first agent (interactive — picks a voice, writes the hook)
clarion-init
# Speak as a saved agent
echo "The pattern holds." | clarion-speak --agent my-agent
# Stream in real time, sentence by sentence
claude "Walk me through this." | clarion-stream --agent my-agent
# Watch a Claude Code session live — speaks mid-session, before tools finish
clarion-watch --agent my-agent
# Multi-agent mode — one router watches all projects, routes each to the right voice
clarion-watch --multi
# Migrate voice configs from Terminus to Clarion
clarion-migrate --dry-run
# Diagnose setup issues — 10 checks with remediation hints
clarion-doctor
# Check server health, loaded agents, and playback status
clarion-status
# Mute an agent
clarion-mute my-agent
clarion-mute my-agent --offFull CLI guide: clarion-doctor, clarion-init, clarion-speak, clarion-stream, clarion-watch, clarion-router, clarion-migrate, clarion-status, clarion-mute, clarion-log, and the Claude Code stop hook.
POST /speak
Body: { "text": "Hello.", "backend": "edge", "voice": "en-GB-RyanNeural", "speed": 1.0 }
Returns: audio/mpeg (X-Clarion-Fallback header if backend fell back to Edge)
GET /voices?backend=edge|kokoro|piper|elevenlabs|google|chatterbox
Returns: { voices: [{ id, label, lang, gender }] }
GET /health
Returns: { edge: "up", kokoro: "up|down|unconfigured", ... }
GET /diagnostics
Returns: { server: { version }, backends: { [name]: { status, configured, detail } } }
| Backend | Config needed | Quality | Voices |
|---|---|---|---|
| Edge TTS | None* | Good | 27 Neural (US, UK, AU, IE, CA, ZA, NZ, IN) |
| Kokoro | KOKORO_SERVER=http://... |
Excellent | 11 (US + UK English) |
| Chatterbox | CHATTERBOX_SERVER=http://... |
Excellent | Voice cloning — unlimited (requires GPU) |
| Piper | PIPER_SERVER=http://... |
OK | 6 (US + UK English) |
| ElevenLabs | ELEVENLABS_API_KEY=... |
Excellent | 11 (US, UK, AU) |
| Google Chirp 3 HD | GOOGLE_TTS_API_KEY=... |
Excellent | 16 (US + UK) |
*Edge TTS uses Microsoft's public Translator API. No API key is required, but this is an unofficial integration and availability is not guaranteed.
Backend setup guide: local Kokoro and Piper install, Docker, API key setup.
By default (proseOnly: true), Clarion strips non-conversational markdown before sending text to the TTS backend — so your agent only speaks what it would actually say, not the structure around it.
| Content | Spoken? |
|---|---|
| Prose paragraphs | Yes |
Heading text (## Like this) |
Yes — markers stripped |
| Bold / italic / strikethrough | Yes — markers stripped |
Links ([text](url)) |
Yes — link text spoken |
Images () |
Yes — alt text spoken |
Blockquotes (> text) |
Yes — markers stripped |
Bullet lists (- item) |
Yes — markers stripped |
Numbered lists (1. item) |
Yes — markers stripped |
Fenced code blocks (```) |
No — removed |
Inline code (`like this`) |
No — removed |
| Indented code blocks | No — removed |
| HTML tags | No — removed |
Horizontal rules (---) |
No — removed |
Toggle Prose only off on any agent card if you want everything spoken verbatim — useful for agents that narrate code reviews or read structured output.
Profiles are stored in localStorage and exportable as JSON.
{
"id": "my-agent",
"name": "My Agent",
"backend": "kokoro",
"voice": "bm_george",
"speed": 1.0,
"proseOnly": true
}Export from the UI (Export all button) or share a single agent profile as a .json file. Import via the Import button.
Cloudflare Worker (Edge TTS only, or with secrets for paid backends):
cd server && wrangler deploy
wrangler secret put KOKORO_SERVER
wrangler secret put ELEVENLABS_API_KEY
wrangler secret put GOOGLE_TTS_API_KEYDocker Compose (for local Kokoro):
docker-compose upChatterbox on RunPod (or any NVIDIA GPU server):
See docs/chatterbox.md for the full setup guide.
No audio output?
Edge TTS returns audio/mpeg. Make sure your player supports it. From CLI, Clarion auto-detects afplay (macOS), mpv, ffplay, paplay, or cvlc (Linux). On Windows, mpv, ffplay, or vlc are detected. Pass --player <command> to override.
Kokoro connection refused?
Check KOKORO_SERVER is set and the server is running on the right port. Docker: docker-compose up handles this automatically. Verify with clarion-status.
clarion-watch not speaking?
Run clarion-log to check recent entries. Make sure the agent profile exists: clarion-speak --list-agents. Check mute state: clarion-mute --list.
Audio overlapping between agents?
clarion-stream uses a process lock file to serialize playback. If a stale lock remains after a crash, delete $TMPDIR/clarion-stream.lock.
Clarion is designed for personal, self-hosted use. For deployments beyond localhost:
- Set
API_KEY=your-secretin the server environment. The browser UI signs requests with HMAC-SHA256. The CLI usesBearer <key>. Use HTTPS for remote deployments. - CORS is open (
*) by default. SetALLOWED_ORIGIN=https://your-domain.comto restrict it. kokoro-server.pyandpiper-server.pybind to127.0.0.1by default. Do not expose them on0.0.0.0unless you trust the network.
Clarion sends the text you provide to whichever TTS backend is selected. Edge TTS, ElevenLabs, and Google Chirp 3 HD route text through external APIs (Microsoft, ElevenLabs, and Google respectively). Kokoro, Chatterbox, and Piper are fully self-hosted — text never leaves your infrastructure. Choose your backend accordingly. No text is stored by the Clarion server.
Clarion is part of the zerovector.design ecosystem — tools for building directly from intent to artifact. See also Terminus, the zero-overhead orchestration layer for multi-agent workflows.
Issues and pull requests are welcome. See CONTRIBUTING.md for setup, code style, and how to add a backend.
Clarion is an audio tool, but contributions are not limited to people who use audio. UI improvements, backend adapters, documentation, and testing are all valuable.
Built by celanthe · Design by Zabethy · Inspired by Investiture by Erika Flowers and Everbloom Reader
