Skip to content

Commit ce39be2

Browse files
StuBehanclaude
andauthored
docs: add demo audio + comparison table to README (#21)
Two visibility wins for the public face of the repo: - docs/demo.wav: 5 seconds of two stackvox voices (af_heart and bf_emma) speaking the tagline, generated locally with the cached Kokoro model. 234 KB at 24kHz PCM_16 — small enough to commit and serve via the GitHub raw URL. Linked from the top of the README so first-time visitors can hear the tool without installing anything. - README "How does this compare to other TTS?" section comparing stackvox against `say`, `espeak-ng`, Piper, Coqui TTS, and the cloud APIs across offline/quality/latency/license. The honest pitch: voice quality alone isn't a reason to switch off Piper, but the resident daemon plus bash helper for sub-15ms shell-side speech is the differentiator for shell-driven workflows. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent b5a3482 commit ce39be2

3 files changed

Lines changed: 20 additions & 0 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ dist/
88
*.bin
99
out/
1010
*.wav
11+
!docs/demo.wav
1112
.DS_Store
1213
.env
1314
.coverage

README.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@
77

88
Offline TTS using [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) via [kokoro-onnx](https://github.com/thewh1teagle/kokoro-onnx). Apache 2.0 model, ~340MB, CPU real-time, plays straight to system audio. Designed to be importable as a Python library, drivable as a CLI, or poked via a unix socket for ~13ms speech requests from shell scripts.
99

10+
🔊 **Hear it:** [docs/demo.wav](./docs/demo.wav) — five seconds of two voices speaking the tagline (`af_heart` then `bf_emma`).
11+
1012
## Install
1113

1214
From PyPI — recommended for most users:
@@ -178,6 +180,23 @@ Model weights (`kokoro-v1.0.onnx`, ~340 MB) and voices are downloaded from the [
178180

179181
Security issues themselves should not be filed as public GitHub issues — see [`SECURITY.md`](./SECURITY.md) for the disclosure process.
180182

183+
## How does this compare to other TTS?
184+
185+
stackvox is a fairly opinionated narrow slice of the TTS space. Here's where it sits next to the obvious neighbours:
186+
187+
| Tool | Offline? | Quality | Latency (typical) | License | Best for |
188+
|---|---|---|---|---|---|
189+
| **stackvox** (Kokoro-82M) || High (24kHz, 50+ voices, 9 languages) | ~300ms in-process · ~13ms via daemon helper | Apache 2.0 | Local apps, shell hooks, anything that wants natural voice without the cloud |
190+
| macOS `say` || OK | ~50ms | macOS only | macOS-only scripts, "good enough" voice |
191+
| `espeak-ng` || Robotic | ~10ms | GPL-3.0 | Accessibility, screen readers, embedded |
192+
| [Piper](https://github.com/rhasspy/piper) || High | ~100ms | MIT | Similar use-case to stackvox; ONNX-based, more voices in some languages |
193+
| [Coqui TTS](https://github.com/coqui-ai/TTS) || Very high (research models) | seconds | MPL-2.0 | Research, fine-tuning, voice cloning |
194+
| OpenAI / ElevenLabs / etc. || Highest | network-bound | Proprietary | Production apps that can pay per-call and accept network dependency |
195+
196+
Where stackvox tries to be different from Piper specifically: a **resident daemon + bash helper** path that gets you sub-15ms speech requests from shell scripts (CI hooks, terminal notifications, status announcements) without paying Python's startup cost on every call. That's basically the point — voice quality alone wouldn't be enough to switch off Piper, but the IPC story makes a difference for shell-driven workflows.
197+
198+
Pick stackvox if you want **good voices, fully offline, with a fast shell-friendly API**.
199+
181200
## License & attributions
182201

183202
stackvox itself is licensed under the **Apache License, Version 2.0** — see [`LICENSE`](./LICENSE). Third-party attributions are collected in [`NOTICE`](./NOTICE); the summary below is informational.

docs/demo.wav

233 KB
Binary file not shown.

0 commit comments

Comments
 (0)