Push-to-talk speech-to-text for Hyprland/Wayland.
Hold a key, speak, release — your words are typed into the focused window.
- Push-to-talk dictation — hold a key, speak, release. Text appears in your focused app.
- Pluggable ASR backends — choose between NeMo (highest accuracy), Sherpa-ONNX (fast, CPU-friendly), or Moonshine (lightweight).
- Text-to-speech — highlight text and hear it read aloud via ElevenLabs, OpenAI, local Piper, MeloTTS, or a self-hosted Kokoro instance.
- Native Wayland overlay — GTK4 layer-shell overlay with blur, transparency, and live transcription feedback.
- Waybar integration — tray-style status icon with tooltip, state colors, and click actions.
- Guided setup wizard — compact GTK onboarding flow with separate ASR, keybind/typing, and TTS steps plus model setup.
- Zero root access — runs entirely in userspace via Hyprland IPC; no
/dev/inputneeded.
The fastest path from zero to working dictation:
# 1. Install from AUR (Arch Linux)
yay -S shuvoice-git
# 2. Run the setup wizard (picks backend, keybind, downloads model)
shuvoice wizard
# 3. Enable and start the background service
systemctl --user enable --now shuvoice.service
# 4. Use it! Hold your push-to-talk key, speak, release.That's it. The wizard handles backend selection, keybind + final text injection choices, TTS setup, model downloads, and Hyprland keybind setup.
Not on Arch? See Manual Installation below.
ShuVoice is available on the AUR as shuvoice-git:
# Using yay
yay -S shuvoice-git
# Or using paru
paru -S shuvoice-gitThe AUR package includes Sherpa-ONNX runtime support out of the box. If your
AUR helper asks which provider to use for python-sherpa-onnx, pick
python-sherpa-onnx-bin (recommended, prebuilt):
yay -S --needed python-sherpa-onnx-bin shuvoice-gitNeMo and Moonshine backends are optional and can be installed separately if needed.
Expand for manual/venv setup instructions
| Component | Requirement |
|---|---|
| OS | Linux with Wayland (Hyprland recommended) |
| Python | 3.10 or newer |
| Package manager | uv (recommended) |
| GPU | Optional (recommended for NeMo / Sherpa CUDA) |
sudo pacman -S \
gtk4 gtk4-layer-shell python-gobject \
portaudio pipewire pipewire-audio pipewire-alsa \
wtype wl-clipboard espeak-nggit clone https://github.com/shuv1337/shuvoice.git
cd shuvoice
# Create venv and install base dependencies
uv syncPick one (or more) backend to install:
# Sherpa-ONNX (fast, CPU-friendly by default; CUDA-capable runtime auto-repaired by setup/wizard)
uv sync --extra asr-sherpa
# NeMo (highest accuracy, requires NVIDIA GPU + CUDA)
uv sync --extra asr-nemo
# Moonshine (lightweight, low resource usage)
uv sync --extra asr-moonshinePython 3.14 + NeMo note: use the override file to avoid build issues:
uv sync --extra asr-nemo --override packaging/constraints/py314-overrides.txt
# ElevenLabs TTS (cloud, high quality)
uv sync --extra tts-elevenlabs
# OpenAI TTS (cloud)
uv sync --extra tts-openai
# Local TTS (Piper, experimental)
uv sync --extra tts-local
# MeloTTS (local, subprocess-isolated — no extra needed, uses separate venv)
# Setup: shuvoice setup --install-missing (when tts_backend = "melotts")The interactive wizard walks you through everything:
shuvoice wizardThe wizard now uses a compact 5-step flow:
- Welcome — quick first-run intro
- Select your ASR backend — Sherpa-ONNX, NeMo, or Moonshine
- Choose a Sherpa profile + device (if applicable) — Streaming (Zipformer) or Instant (Parakeet), then CPU or GPU
- Pick your push-to-talk key + final text injection mode — Right Ctrl, Insert, F9, Super+V, or custom, plus Auto / Clipboard / Direct typing
- Choose your TTS provider + default voice — ElevenLabs, OpenAI, Local Piper, MeloTTS, or Kokoro
- For Local Piper — choose either automatic setup (install Piper + download a curated voice) or an existing local model path
- For Kokoro — set the base URL for your local OpenAI-compatible Kokoro instance, fetch and pick from the live
/audio/voiceslist or type a manual voice ID, then use Speak sample to verify playback before finishing
- Review summary + download model files — with progress indicator and cancel support
- Auto-configure Hyprland keybinds — adds
bind/bindrlines if the key isn't already used
If you prefer a non-interactive approach:
# Check dependencies, download models, run preflight
shuvoice setup
# Auto-install missing dependencies (when possible)
shuvoice setup --install-missing
# Local Piper: auto-install Piper + download a curated voice for the active local TTS config
shuvoice setup --install-missing --tts-local-voice en_US-amy-medium --non-interactive
# Local Piper: download curated voice assets into a custom managed directory
shuvoice setup --install-missing \
--tts-local-voice en_US-lessac-high \
--tts-local-model-dir ~/.local/share/shuvoice/models/piper \
--non-interactive
# Quick dependency check only (skip model download + preflight)
shuvoice setup --skip-model-download --skip-preflightVerify everything is ready before first launch:
shuvoice preflightThis checks Python version, required modules, audio devices, ASR/TTS backend
dependencies, required binaries (wtype, wl-copy, wl-paste), and GTK
layer-shell availability.
On CUDA hosts, shuvoice setup --install-missing now also attempts to build a
gpu-enabled Sherpa runtime in the venv and wire the required CUDA compat libs
when sherpa_provider = "cuda" is selected.
# Enable and start (will auto-start on login)
systemctl --user enable --now shuvoice.service
# Check status
systemctl --user status shuvoice.service
# View logs
journalctl --user -u shuvoice.service -fSetting up the service for venv/source installs
The default service unit expects /usr/bin/shuvoice (AUR install path).
For venv workflows, create an override:
# Copy the unit file
mkdir -p ~/.config/systemd/user
cp packaging/systemd/user/shuvoice.service ~/.config/systemd/user/
# Override the ExecStart for your venv
systemctl --user edit shuvoice.serviceAdd this override:
[Service]
ExecStart=
ExecStart=%h/repos/shuvoice/.venv/bin/shuvoiceThen reload and start:
systemctl --user daemon-reload
systemctl --user import-environment WAYLAND_DISPLAY DISPLAY XDG_RUNTIME_DIR HYPRLAND_INSTANCE_SIGNATURE DBUS_SESSION_BUS_ADDRESS XDG_CURRENT_DESKTOP XDG_SESSION_TYPE
systemctl --user enable --now shuvoice.service# AUR install
shuvoice
# Source/venv install
shuvoice runHold your configured push-to-talk key, speak, and release. The transcribed text is typed into your focused window.
The wizard configures these automatically, but you can set them manually in
~/.config/hypr/hyprland.conf:
# Push-to-talk (STT) — example using Right Ctrl
bind = , Control_R, exec, shuvoice control start --control-wait-sec 0
bindr = , Control_R, exec, shuvoice control stop --control-wait-sec 0
bindr = CTRL, Control_R, exec, shuvoice control stop --control-wait-sec 0
# Speak selected text (TTS)
bind = SUPER CTRL, S, exec, shuvoice control tts_speak --control-wait-sec 0Highlight text in any app, then press your TTS keybind (default: Super+Ctrl+S).
ShuVoice reads the selected text aloud using your configured TTS backend.
- Uses primary selection first, then clipboard fallback
- STT and TTS are mutually exclusive (starting one stops the other)
- Re-triggering while speaking interrupts and starts fresh
- The TTS overlay includes pause/resume, restart, stop, voice selection, and provider-backed speed controls
- Changing speed while speaking restarts the current utterance from the beginning at the new synthesis speed
Send commands to a running ShuVoice instance:
shuvoice control start # Begin recording
shuvoice control stop # Stop recording and type result
shuvoice control toggle # Toggle recording on/off
shuvoice control status # Show current state
shuvoice control metrics # Show runtime metrics
# TTS commands
shuvoice control tts_speak # Read selected text aloud
shuvoice control tts_stop # Stop TTS playback
shuvoice control tts_status # Show TTS stateshuvoice --help # Show all commands
shuvoice audio list-devices # List audio input devices
shuvoice config effective # Print merged effective config
shuvoice config validate # Validate your config file
shuvoice model download # Download model for active backend
shuvoice diagnostics # Show runtime diagnostics
shuvoice wizard # Re-run the setup wizardConfig file location: ~/.config/shuvoice/config.toml
The wizard creates this for you. To edit manually, see the full reference at
examples/config.toml.
[asr]
asr_backend = "sherpa" # sherpa | nemo | moonshineRestart the service after changing:
systemctl --user restart shuvoice.service| Backend | Best For | GPU Required? | Accuracy | Speed |
|---|---|---|---|---|
| Sherpa-ONNX | General use, CPU systems | No (optional CUDA) | Good | Fast |
| NeMo | Maximum accuracy | Yes (CUDA) | Best | Medium |
| Moonshine | Low-resource systems | No (optional CUDA) | Fair | Varies |
Detailed backend configuration
Default backend. Two profiles available:
- Streaming (Zipformer) — live partial transcription as you speak
- Instant (Parakeet) — transcribes on key release for higher accuracy
[asr]
asr_backend = "sherpa"
sherpa_provider = "cpu" # cpu | cuda
sherpa_model_name = "sherpa-onnx-streaming-zipformer-en-kroko-2025-08-06"
sherpa_decode_mode = "auto" # auto | streaming | offline_instantParakeet offline instant mode (recommended for Parakeet models):
[asr]
asr_backend = "sherpa"
sherpa_model_name = "sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8"
instant_mode = true
sherpa_decode_mode = "offline_instant"Models auto-download to ~/.local/share/shuvoice/models/sherpa/ when
sherpa_model_dir is unset.
Highest accuracy. Requires NVIDIA GPU with CUDA.
[asr]
asr_backend = "nemo"
model_name = "nvidia/nemotron-speech-streaming-en-0.6b"
device = "cuda" # cuda | cpu
right_context = 13 # 0-13; higher = more accurate, more latencyLightweight, CPU-friendly. Best for short utterances.
[asr]
asr_backend = "moonshine"
moonshine_model_name = "moonshine/tiny" # moonshine/tiny (fast) | moonshine/base (slower)
moonshine_provider = "cpu" # cpu | cuda
moonshine_max_window_sec = 5.0Controls how transcribed text is typed into your focused app:
[typing]
typing_final_injection_mode = "auto" # auto | clipboard | direct
typing_text_case = "default" # default | lowercase| Setting | Values | Behavior |
|---|---|---|
typing_final_injection_mode |
auto, clipboard, direct |
Controls how final text is injected into the focused app |
typing_text_case |
default, lowercase |
Controls whether final text keeps normal capitalization/punctuation style or is forced to lowercase |
| Injection mode | Behavior |
|---|---|
auto (default) |
Detects clipboard watchers and chooses the safest method automatically |
clipboard |
Copies text to clipboard and pastes with Ctrl+V |
direct |
Types text directly via wtype (avoids clipboard) |
| Text case | Behavior |
|---|---|
default (default) |
Keeps ShuVoice's normal punctuation/capitalization processing |
lowercase |
Forces final committed text to lowercase for informal chat/conversation |
Quick toggles from CLI:
shuvoice config set typing_final_injection_mode direct
shuvoice config set typing_text_case lowercaseAdd custom word/phrase corrections applied to transcribed text:
[typing.text_replacements]
"speech to text" = "speech-to-text"
"um" = "" # empty value deletes the wordMatches are case-insensitive, whole-word/phrase only (longest first). ShuVoice includes built-in corrections for common ASR variants of "ShuVoice" and "Hyprland".
Set instant_mode = true for low-latency tuning across all backends:
[asr]
instant_mode = trueEffects per backend:
- NeMo: forces
right_context = 0 - Sherpa streaming: caps
sherpa_chunk_msat 80 - Sherpa offline: enables one-shot release-to-final decode
- Moonshine: forces
moonshine/tiny, caps window to 3s and tokens to 48
[overlay]
font_size = 24
font_family = "JetBrains Mono" # optional
bg_opacity = 0.55Hyprland blur/transparency for the overlay:
# In hyprland.conf — ShuVoice uses layer-shell namespaces: stt-overlay, tts-overlay
layerrule = blur, stt-overlay
layerrule = ignorealpha 0.20, stt-overlay
layerrule = xray 1, stt-overlay
layerrule = blur, tts-overlay
layerrule = ignorealpha 0.20, tts-overlay[tts]
tts_enabled = true
tts_backend = "elevenlabs" # elevenlabs | openai | local | melotts | kokoro
tts_default_voice_id = "zNsotODqUhvbJ5wMG7Ei" # ElevenLabs default
# OpenAI defaults are auto-applied when tts_backend = "openai":
# tts_default_voice_id = "onyx"
# tts_model_id = "gpt-4o-mini-tts"
# tts_api_key_env = "OPENAI_API_KEY"
# Local Piper defaults are auto-applied when tts_backend = "local":
# tts_default_voice_id = "default" # use first discovered local .onnx model
# tts_model_id = "piper"
# tts_local_model_path = "~/.local/share/shuvoice/models/piper" # wizard/setup managed directory
# tts_local_voice = "en_US-amy-medium" # optional explicit curated model stem
# Kokoro defaults are auto-applied when tts_backend = "kokoro":
# tts_default_voice_id = "af_heart"
# tts_model_id = "kokoro"
# tts_kokoro_base_url = "http://localhost:8880/v1"
tts_model_id = "eleven_flash_v2_5"
tts_api_key_env = "ELEVENLABS_API_KEY" # env var name (not the key itself)
tts_playback_speed = 1.0 # default synthesis speed (0.5x to 2.0x)Set your API key in ~/.config/shuvoice/local.dev:
# ElevenLabs
ELEVENLABS_API_KEY=sk-your-key-here
# OpenAI
OPENAI_API_KEY=sk-your-key-hereFor Local Piper, you now have two setup paths:
- Automatic setup — use the wizard's Local Piper automatic mode, or run
shuvoice setup --install-missingwithtts_backend = "local". ShuVoice will:- install
piper-ttson Arch when possible, - download a curated voice into
~/.local/share/shuvoice/models/piper/, - persist
tts_local_model_path/tts_local_voice, and - validate the resulting setup.
- install
- Manual setup — set
tts_local_model_pathto a.onnxfile or a directory of.onnxvoices.
If you point at a directory and leave tts_local_voice unset, ShuVoice uses the first discovered
model automatically. Piper sidecar files (.onnx.json) are also used to detect the correct
playback sample rate.
For MeloTTS, no API key is required — it runs fully locally using subprocess isolation (separate Python 3.12 venv) to avoid dependency conflicts. Five English voices are available: American (EN-US), British (EN-BR), Indian (EN-INDIA), Australian (EN-AU), and Newest (EN-Newest). CPU real-time inference is supported, with optional CUDA acceleration.
[tts]
tts_enabled = true
tts_backend = "melotts"
tts_default_voice_id = "EN-US" # EN-US | EN-BR | EN-INDIA | EN-AU | EN-Newest
tts_playback_speed = 1.0
# tts_melotts_device = "auto" # auto | cpu | cuda
# tts_melotts_venv_path = "~/.local/share/shuvoice/melotts-venv" # default managed locationSetup is automated: run shuvoice setup --install-missing with tts_backend = "melotts",
or select MeloTTS in the setup wizard. ShuVoice handles venv creation and model download.
For Kokoro, no install step is required inside ShuVoice itself — point the wizard at your
OpenAI-compatible Kokoro base URL, choose a voice ID, and keep tts_backend = "kokoro".
shuvoice setup and shuvoice preflight will verify connectivity by querying the voice list.
[tts]
tts_enabled = true
tts_backend = "kokoro"
tts_default_voice_id = "af_heart"
tts_model_id = "kokoro"
tts_kokoro_base_url = "http://localhost:8880/v1"
tts_playback_speed = 1.0Pre-built config files for common setups:
| File | Description |
|---|---|
examples/config.toml |
Full reference with all options |
examples/config-sherpa-cpu.toml |
Sherpa on CPU |
examples/config-sherpa-cuda.toml |
Sherpa on GPU |
examples/config-sherpa-parakeet-offline.toml |
Parakeet instant mode |
examples/config-nemo-cuda.toml |
NeMo on GPU |
examples/config-nemo-cpu.toml |
NeMo on CPU |
examples/config-moonshine-cpu.toml |
Moonshine on CPU |
ShuVoice ships a Waybar helper (shuvoice-waybar) for a tray-style status icon.
When running on Hyprland, the tooltip also shows the configured TTS provider /
voice, default synthesis speed, plus the detected push-to-talk and TTS keybinds.
Add to your Waybar config:
State-based CSS styling:
#custom-shuvoice.recording { color: #f38ba8; }
#custom-shuvoice.processing { color: #fab387; }
#custom-shuvoice.idle { color: #a6e3a1; }
#custom-shuvoice.starting { color: #f9e2af; }
#custom-shuvoice.stopped { color: #7f849c; }
#custom-shuvoice.error { color: #f38ba8; }Waybar setup details
Available commands:
shuvoice-waybar status # JSON output for Waybar
shuvoice-waybar menu # Right-click menu
shuvoice-waybar toggle-record # Toggle recording
shuvoice-waybar service-toggle # Start/stop systemd service
shuvoice-waybar launch-wizard # Open setup wizardRight-click menu uses one of: omarchy-launch-walker, walker, wofi,
rofi, bemenu, or dmenu.
If Waybar can't find shuvoice-waybar in PATH, use the wrapper script:
# Install a PATH symlink (default: ~/.local/bin/shuvoice-waybar)
./scripts/install-waybar-wrapper.sh
# Or point Waybar to the full path
# exec: "/home/you/.venv/bin/shuvoice-waybar status"See examples/waybar-custom-shuvoice.jsonc
and examples/waybar-shuvoice.css for
complete examples.
Missing Python modules
| Error | Fix |
|---|---|
No module named 'torch' or 'nemo' |
uv sync --extra asr-nemo or install python-pytorch-cuda (Arch) |
No module named 'sherpa_onnx' |
AUR: yay -S python-sherpa-onnx-bin · venv: uv sync --extra asr-sherpa |
No module named 'moonshine_onnx' |
uv sync --extra asr-moonshine |
No module named 'gi' |
sudo pacman -S python-gobject gtk4 gtk4-layer-shell |
Sherpa / Parakeet errors
| Error | Fix |
|---|---|
Missing encoder/decoder/joiner artifacts |
Point sherpa_model_dir to a valid model directory, or unset it to auto-download |
Parakeet requires offline instant mode |
Use sherpa_decode_mode = "offline_instant" with instant_mode = true |
window_size does not exist in the metadata |
Use sherpa_decode_mode = "offline_instant" for that model, or switch to Zipformer |
CUDAExecutionProvider not found |
Install CUDA-enabled sherpa-onnx, or use sherpa_provider = "cpu" |
Audio and recognition issues
| Problem | Fix |
|---|---|
| Wrong microphone | Run shuvoice audio list-devices, then set audio_device by name (not index) in config |
| Mic too quiet | Increase input_gain (try 1.3 to 1.8) |
| Phantom text on silent presses | Raise silence_rms_threshold (try 0.010→0.015) or silence_rms_multiplier (try 2.0) |
| Long phrases cut out | Keep streaming_stall_guard = true (default); tune streaming_stall_chunks (3–6) |
| Clipboard pollution | Set typing_final_injection_mode = "auto" (detects clipboard managers automatically) |
| Need casual/lowercase chat output | Set typing_text_case = "lowercase" |
System/runtime errors
| Error | Fix |
|---|---|
libgtk4-layer-shell.so not found |
sudo pacman -S gtk4-layer-shell |
wtype not found in PATH |
sudo pacman -S wtype |
Control socket not found |
Start ShuVoice first before sending control commands |
espeak-ng not found |
sudo pacman -S espeak-ng (only needed for roundtrip test scripts) |
tts_speak says no selected text |
Highlight text first; verify wl-paste works |
| ElevenLabs/OpenAI 401 errors | Export the provider API key in the env var named by tts_api_key_env; run shuvoice preflight |
Failed to build kaldialign (Python 3.14) |
Use uv sync --extra asr-nemo --override packaging/constraints/py314-overrides.txt |
# Clone and install with dev tooling
git clone https://github.com/shuv1337/shuvoice.git
cd shuvoice
uv sync --dev
# Run checks
uv run ruff check shuvoice tests
uv run ruff format --check shuvoice tests
uv run pytest -m "not gui" -vSee CONTRIBUTING.md for full guidelines.
Advanced test suites
# IPC end-to-end smoke tests
pytest -m e2e -k ipc_smoke -v
# Manual phrase regression (opt-in)
SHUVOICE_RUN_ROUNDTRIP=1 \
SHUVOICE_ROUNDTRIP_BACKEND=nemo \
SHUVOICE_ROUNDTRIP_DEVICE=cuda \
pytest -m integration -k roundtrip_regression -v
# Long-phrase round-trip harness (TTS → STT)
python scripts/tts_roundtrip.py --asr-backend nemo --device cuda
# Smoke test script
./scripts/smoke-test.sh| Repository | github.com/shuv1337/shuvoice |
| AUR Package | shuvoice-git |
| Contributing | CONTRIBUTING.md |
| Code of Conduct | CODE_OF_CONDUCT.md |
| Security | SECURITY.md |
| Brand Assets | docs/BRANDING.md |
ShuVoice is released under the MIT License.




