ShuVoice

ShuVoice logo

Push-to-talk speech-to-text for Hyprland/Wayland.
Hold a key, speak, release — your words are typed into the focused window.

Features

Push-to-talk dictation — hold a key, speak, release. Text appears in your focused app.
Pluggable ASR backends — choose between NeMo (highest accuracy), Sherpa-ONNX (fast, CPU-friendly), or Moonshine (lightweight).
Text-to-speech — highlight text and hear it read aloud via ElevenLabs, OpenAI, local Piper, MeloTTS, or a self-hosted Kokoro instance.
Native Wayland overlay — GTK4 layer-shell overlay with blur, transparency, and live transcription feedback.
Waybar integration — tray-style status icon with tooltip, state colors, and click actions.
Guided setup wizard — compact GTK onboarding flow with separate ASR, keybind/typing, and TTS steps plus model setup.
Zero root access — runs entirely in userspace via Hyprland IPC; no /dev/input needed.

Quick Start

The fastest path from zero to working dictation:

# 1. Install from AUR (Arch Linux)
yay -S shuvoice-git

# 2. Run the setup wizard (picks backend, keybind, downloads model)
shuvoice wizard

# 3. Enable and start the background service
systemctl --user enable --now shuvoice.service

# 4. Use it! Hold your push-to-talk key, speak, release.

That's it. The wizard handles backend selection, keybind + final text injection choices, TTS setup, model downloads, and Hyprland keybind setup.

Not on Arch? See Manual Installation below.

Installation

Option A: AUR Package (recommended)

ShuVoice is available on the AUR as shuvoice-git:

# Using yay
yay -S shuvoice-git

# Or using paru
paru -S shuvoice-git

The AUR package includes Sherpa-ONNX runtime support out of the box. If your AUR helper asks which provider to use for python-sherpa-onnx, pick python-sherpa-onnx-bin (recommended, prebuilt):

yay -S --needed python-sherpa-onnx-bin shuvoice-git

NeMo and Moonshine backends are optional and can be installed separately if needed.

Option B: Manual Installation (from source)

Expand for manual/venv setup instructions

Prerequisites

Component	Requirement
OS	Linux with Wayland (Hyprland recommended)
Python	3.10 or newer
Package manager	uv (recommended)
GPU	Optional (recommended for NeMo / Sherpa CUDA)

1. Install system dependencies (Arch Linux)

sudo pacman -S \
  gtk4 gtk4-layer-shell python-gobject \
  portaudio pipewire pipewire-audio pipewire-alsa \
  wtype wl-clipboard espeak-ng

2. Clone and set up the project

git clone https://github.com/shuv1337/shuvoice.git
cd shuvoice

# Create venv and install base dependencies
uv sync

3. Install your chosen ASR backend

Pick one (or more) backend to install:

# Sherpa-ONNX (fast, CPU-friendly by default; CUDA-capable runtime auto-repaired by setup/wizard)
uv sync --extra asr-sherpa

# NeMo (highest accuracy, requires NVIDIA GPU + CUDA)
uv sync --extra asr-nemo

# Moonshine (lightweight, low resource usage)
uv sync --extra asr-moonshine

Python 3.14 + NeMo note: use the override file to avoid build issues:
uv sync --extra asr-nemo --override packaging/constraints/py314-overrides.txt

4. Optional: install TTS extras

# ElevenLabs TTS (cloud, high quality)
uv sync --extra tts-elevenlabs

# OpenAI TTS (cloud)
uv sync --extra tts-openai

# Local TTS (Piper, experimental)
uv sync --extra tts-local

# MeloTTS (local, subprocess-isolated — no extra needed, uses separate venv)
# Setup: shuvoice setup --install-missing (when tts_backend = "melotts")

First Run

Setup Wizard (recommended)

The interactive wizard walks you through everything:

shuvoice wizard

The wizard now uses a compact 5-step flow:

Welcome — quick first-run intro
Select your ASR backend — Sherpa-ONNX, NeMo, or Moonshine
Choose a Sherpa profile + device (if applicable) — Streaming (Zipformer) or Instant (Parakeet), then CPU or GPU
Pick your push-to-talk key + final text injection mode — Right Ctrl, Insert, F9, Super+V, or custom, plus Auto / Clipboard / Direct typing
Choose your TTS provider + default voice — ElevenLabs, OpenAI, Local Piper, MeloTTS, or Kokoro
- For Local Piper — choose either automatic setup (install Piper + download a curated voice) or an existing local model path
- For Kokoro — set the base URL for your local OpenAI-compatible Kokoro instance, fetch and pick from the live /audio/voices list or type a manual voice ID, then use Speak sample to verify playback before finishing
Review summary + download model files — with progress indicator and cancel support
Auto-configure Hyprland keybinds — adds bind/bindr lines if the key isn't already used

Alternative: CLI Setup

If you prefer a non-interactive approach:

# Check dependencies, download models, run preflight
shuvoice setup

# Auto-install missing dependencies (when possible)
shuvoice setup --install-missing

# Local Piper: auto-install Piper + download a curated voice for the active local TTS config
shuvoice setup --install-missing --tts-local-voice en_US-amy-medium --non-interactive

# Local Piper: download curated voice assets into a custom managed directory
shuvoice setup --install-missing \
  --tts-local-voice en_US-lessac-high \
  --tts-local-model-dir ~/.local/share/shuvoice/models/piper \
  --non-interactive

# Quick dependency check only (skip model download + preflight)
shuvoice setup --skip-model-download --skip-preflight

Preflight Check

Verify everything is ready before first launch:

shuvoice preflight

This checks Python version, required modules, audio devices, ASR/TTS backend dependencies, required binaries (wtype, wl-copy, wl-paste), and GTK layer-shell availability.

On CUDA hosts, shuvoice setup --install-missing now also attempts to build a gpu-enabled Sherpa runtime in the venv and wire the required CUDA compat libs when sherpa_provider = "cuda" is selected.

Starting ShuVoice

As a systemd user service (recommended)

# Enable and start (will auto-start on login)
systemctl --user enable --now shuvoice.service

# Check status
systemctl --user status shuvoice.service

# View logs
journalctl --user -u shuvoice.service -f

Setting up the service for venv/source installs

The default service unit expects /usr/bin/shuvoice (AUR install path). For venv workflows, create an override:

# Copy the unit file
mkdir -p ~/.config/systemd/user
cp packaging/systemd/user/shuvoice.service ~/.config/systemd/user/

# Override the ExecStart for your venv
systemctl --user edit shuvoice.service

Add this override:

[Service]
ExecStart=
ExecStart=%h/repos/shuvoice/.venv/bin/shuvoice

Then reload and start:

systemctl --user daemon-reload
systemctl --user import-environment WAYLAND_DISPLAY DISPLAY XDG_RUNTIME_DIR HYPRLAND_INSTANCE_SIGNATURE DBUS_SESSION_BUS_ADDRESS XDG_CURRENT_DESKTOP XDG_SESSION_TYPE
systemctl --user enable --now shuvoice.service

Manual launch (foreground)

# AUR install
shuvoice

# Source/venv install
shuvoice run

Usage

Push-to-Talk (STT)

Hold your configured push-to-talk key, speak, and release. The transcribed text is typed into your focused window.

Hyprland Keybinds

The wizard configures these automatically, but you can set them manually in ~/.config/hypr/hyprland.conf:

# Push-to-talk (STT) — example using Right Ctrl
bind = , Control_R, exec, shuvoice control start --control-wait-sec 0
bindr = , Control_R, exec, shuvoice control stop --control-wait-sec 0
bindr = CTRL, Control_R, exec, shuvoice control stop --control-wait-sec 0

# Speak selected text (TTS)
bind = SUPER CTRL, S, exec, shuvoice control tts_speak --control-wait-sec 0

Text-to-Speech (TTS)

Highlight text in any app, then press your TTS keybind (default: Super+Ctrl+S). ShuVoice reads the selected text aloud using your configured TTS backend.

Uses primary selection first, then clipboard fallback
STT and TTS are mutually exclusive (starting one stops the other)
Re-triggering while speaking interrupts and starts fresh
The TTS overlay includes pause/resume, restart, stop, voice selection, and provider-backed speed controls
Changing speed while speaking restarts the current utterance from the beginning at the new synthesis speed

Control Commands

Send commands to a running ShuVoice instance:

shuvoice control start       # Begin recording
shuvoice control stop        # Stop recording and type result
shuvoice control toggle      # Toggle recording on/off
shuvoice control status      # Show current state
shuvoice control metrics     # Show runtime metrics

# TTS commands
shuvoice control tts_speak   # Read selected text aloud
shuvoice control tts_stop    # Stop TTS playback
shuvoice control tts_status  # Show TTS state

Useful CLI Commands

shuvoice --help              # Show all commands
shuvoice audio list-devices  # List audio input devices
shuvoice config effective    # Print merged effective config
shuvoice config validate     # Validate your config file
shuvoice model download      # Download model for active backend
shuvoice diagnostics         # Show runtime diagnostics
shuvoice wizard              # Re-run the setup wizard

Configuration

Config file location: ~/.config/shuvoice/config.toml

The wizard creates this for you. To edit manually, see the full reference at examples/config.toml.

Switching ASR Backends

[asr]
asr_backend = "sherpa"     # sherpa | nemo | moonshine

Restart the service after changing:

systemctl --user restart shuvoice.service

Backend Comparison

Backend	Best For	GPU Required?	Accuracy	Speed
Sherpa-ONNX	General use, CPU systems	No (optional CUDA)	Good	Fast
NeMo	Maximum accuracy	Yes (CUDA)	Best	Medium
Moonshine	Low-resource systems	No (optional CUDA)	Fair	Varies

Detailed backend configuration

Sherpa-ONNX

Default backend. Two profiles available:

Streaming (Zipformer) — live partial transcription as you speak
Instant (Parakeet) — transcribes on key release for higher accuracy

[asr]
asr_backend = "sherpa"
sherpa_provider = "cpu"              # cpu | cuda
sherpa_model_name = "sherpa-onnx-streaming-zipformer-en-kroko-2025-08-06"
sherpa_decode_mode = "auto"          # auto | streaming | offline_instant

Parakeet offline instant mode (recommended for Parakeet models):

[asr]
asr_backend = "sherpa"
sherpa_model_name = "sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8"
instant_mode = true
sherpa_decode_mode = "offline_instant"

Models auto-download to ~/.local/share/shuvoice/models/sherpa/ when sherpa_model_dir is unset.

NeMo

Highest accuracy. Requires NVIDIA GPU with CUDA.

[asr]
asr_backend = "nemo"
model_name = "nvidia/nemotron-speech-streaming-en-0.6b"
device = "cuda"                      # cuda | cpu
right_context = 13                   # 0-13; higher = more accurate, more latency

Moonshine

Lightweight, CPU-friendly. Best for short utterances.

[asr]
asr_backend = "moonshine"
moonshine_model_name = "moonshine/tiny"    # moonshine/tiny (fast) | moonshine/base (slower)
moonshine_provider = "cpu"                 # cpu | cuda
moonshine_max_window_sec = 5.0

Text Injection Mode

Controls how transcribed text is typed into your focused app:

[typing]
typing_final_injection_mode = "auto"   # auto | clipboard | direct
typing_text_case = "default"           # default | lowercase

Setting	Values	Behavior
`typing_final_injection_mode`	`auto`, `clipboard`, `direct`	Controls how final text is injected into the focused app
`typing_text_case`	`default`, `lowercase`	Controls whether final text keeps normal capitalization/punctuation style or is forced to lowercase

Injection mode	Behavior
`auto` (default)	Detects clipboard watchers and chooses the safest method automatically
`clipboard`	Copies text to clipboard and pastes with `Ctrl+V`
`direct`	Types text directly via `wtype` (avoids clipboard)

Text case	Behavior
`default` (default)	Keeps ShuVoice's normal punctuation/capitalization processing
`lowercase`	Forces final committed text to lowercase for informal chat/conversation

Quick toggles from CLI:

shuvoice config set typing_final_injection_mode direct
shuvoice config set typing_text_case lowercase

Text Replacements

Add custom word/phrase corrections applied to transcribed text:

[typing.text_replacements]
"speech to text" = "speech-to-text"
"um" = ""                              # empty value deletes the word

Matches are case-insensitive, whole-word/phrase only (longest first). ShuVoice includes built-in corrections for common ASR variants of "ShuVoice" and "Hyprland".

Instant Mode

Set instant_mode = true for low-latency tuning across all backends:

[asr]
instant_mode = true

Effects per backend:

NeMo: forces right_context = 0
Sherpa streaming: caps sherpa_chunk_ms at 80
Sherpa offline: enables one-shot release-to-final decode
Moonshine: forces moonshine/tiny, caps window to 3s and tokens to 48

Overlay Appearance

[overlay]
font_size = 24
font_family = "JetBrains Mono"   # optional
bg_opacity = 0.55

Hyprland blur/transparency for the overlay:

# In hyprland.conf — ShuVoice uses layer-shell namespaces: stt-overlay, tts-overlay
layerrule = blur, stt-overlay
layerrule = ignorealpha 0.20, stt-overlay
layerrule = xray 1, stt-overlay

layerrule = blur, tts-overlay
layerrule = ignorealpha 0.20, tts-overlay

TTS Configuration

[tts]
tts_enabled = true
tts_backend = "elevenlabs"                      # elevenlabs | openai | local | melotts | kokoro
tts_default_voice_id = "zNsotODqUhvbJ5wMG7Ei"   # ElevenLabs default
# OpenAI defaults are auto-applied when tts_backend = "openai":
# tts_default_voice_id = "onyx"
# tts_model_id = "gpt-4o-mini-tts"
# tts_api_key_env = "OPENAI_API_KEY"
# Local Piper defaults are auto-applied when tts_backend = "local":
# tts_default_voice_id = "default"              # use first discovered local .onnx model
# tts_model_id = "piper"
# tts_local_model_path = "~/.local/share/shuvoice/models/piper"   # wizard/setup managed directory
# tts_local_voice = "en_US-amy-medium"          # optional explicit curated model stem
# Kokoro defaults are auto-applied when tts_backend = "kokoro":
# tts_default_voice_id = "af_heart"
# tts_model_id = "kokoro"
# tts_kokoro_base_url = "http://localhost:8880/v1"
tts_model_id = "eleven_flash_v2_5"
tts_api_key_env = "ELEVENLABS_API_KEY"          # env var name (not the key itself)
tts_playback_speed = 1.0                         # default synthesis speed (0.5x to 2.0x)

Set your API key in ~/.config/shuvoice/local.dev:

# ElevenLabs
ELEVENLABS_API_KEY=sk-your-key-here

# OpenAI
OPENAI_API_KEY=sk-your-key-here

For Local Piper, you now have two setup paths:

Automatic setup — use the wizard's Local Piper automatic mode, or run shuvoice setup --install-missing with tts_backend = "local". ShuVoice will:
- install piper-tts on Arch when possible,
- download a curated voice into ~/.local/share/shuvoice/models/piper/,
- persist tts_local_model_path / tts_local_voice, and
- validate the resulting setup.
Manual setup — set tts_local_model_path to a .onnx file or a directory of .onnx voices.

If you point at a directory and leave tts_local_voice unset, ShuVoice uses the first discovered model automatically. Piper sidecar files (.onnx.json) are also used to detect the correct playback sample rate.

For MeloTTS, no API key is required — it runs fully locally using subprocess isolation (separate Python 3.12 venv) to avoid dependency conflicts. Five English voices are available: American (EN-US), British (EN-BR), Indian (EN-INDIA), Australian (EN-AU), and Newest (EN-Newest). CPU real-time inference is supported, with optional CUDA acceleration.

[tts]
tts_enabled = true
tts_backend = "melotts"
tts_default_voice_id = "EN-US"                   # EN-US | EN-BR | EN-INDIA | EN-AU | EN-Newest
tts_playback_speed = 1.0
# tts_melotts_device = "auto"                    # auto | cpu | cuda
# tts_melotts_venv_path = "~/.local/share/shuvoice/melotts-venv"  # default managed location

Setup is automated: run shuvoice setup --install-missing with tts_backend = "melotts", or select MeloTTS in the setup wizard. ShuVoice handles venv creation and model download.

For Kokoro, no install step is required inside ShuVoice itself — point the wizard at your OpenAI-compatible Kokoro base URL, choose a voice ID, and keep tts_backend = "kokoro". shuvoice setup and shuvoice preflight will verify connectivity by querying the voice list.

[tts]
tts_enabled = true
tts_backend = "kokoro"
tts_default_voice_id = "af_heart"
tts_model_id = "kokoro"
tts_kokoro_base_url = "http://localhost:8880/v1"
tts_playback_speed = 1.0

Example Configs

Pre-built config files for common setups:

File	Description
`examples/config.toml`	Full reference with all options
`examples/config-sherpa-cpu.toml`	Sherpa on CPU
`examples/config-sherpa-cuda.toml`	Sherpa on GPU
`examples/config-sherpa-parakeet-offline.toml`	Parakeet instant mode
`examples/config-nemo-cuda.toml`	NeMo on GPU
`examples/config-nemo-cpu.toml`	NeMo on CPU
`examples/config-moonshine-cpu.toml`	Moonshine on CPU

Waybar Integration

ShuVoice ships a Waybar helper (shuvoice-waybar) for a tray-style status icon. When running on Hyprland, the tooltip also shows the configured TTS provider / voice, default synthesis speed, plus the detected push-to-talk and TTS keybinds.

Add to your Waybar config:

"custom/shuvoice": {
  "return-type": "json",
  "exec": "shuvoice-waybar status",
  "interval": 1,
  "on-click": "shuvoice-waybar toggle-record",
  "on-click-middle": "shuvoice-waybar service-toggle",
  "on-click-right": "shuvoice-waybar menu",
  "tooltip": true
}

State-based CSS styling:

#custom-shuvoice.recording  { color: #f38ba8; }
#custom-shuvoice.processing { color: #fab387; }
#custom-shuvoice.idle       { color: #a6e3a1; }
#custom-shuvoice.starting   { color: #f9e2af; }
#custom-shuvoice.stopped    { color: #7f849c; }
#custom-shuvoice.error      { color: #f38ba8; }

Waybar setup details

Available commands:

shuvoice-waybar status          # JSON output for Waybar
shuvoice-waybar menu            # Right-click menu
shuvoice-waybar toggle-record   # Toggle recording
shuvoice-waybar service-toggle  # Start/stop systemd service
shuvoice-waybar launch-wizard   # Open setup wizard

Right-click menu uses one of: omarchy-launch-walker, walker, wofi, rofi, bemenu, or dmenu.

If Waybar can't find shuvoice-waybar in PATH, use the wrapper script:

# Install a PATH symlink (default: ~/.local/bin/shuvoice-waybar)
./scripts/install-waybar-wrapper.sh

# Or point Waybar to the full path
# exec: "/home/you/.venv/bin/shuvoice-waybar status"

See examples/waybar-custom-shuvoice.jsonc and examples/waybar-shuvoice.css for complete examples.

Troubleshooting

Missing Python modules

Error	Fix
`No module named 'torch'` or `'nemo'`	`uv sync --extra asr-nemo` or install `python-pytorch-cuda` (Arch)
`No module named 'sherpa_onnx'`	AUR: `yay -S python-sherpa-onnx-bin` · venv: `uv sync --extra asr-sherpa`
`No module named 'moonshine_onnx'`	`uv sync --extra asr-moonshine`
`No module named 'gi'`	`sudo pacman -S python-gobject gtk4 gtk4-layer-shell`

Sherpa / Parakeet errors

Error	Fix
Missing `encoder/decoder/joiner` artifacts	Point `sherpa_model_dir` to a valid model directory, or unset it to auto-download
`Parakeet requires offline instant mode`	Use `sherpa_decode_mode = "offline_instant"` with `instant_mode = true`
`window_size does not exist in the metadata`	Use `sherpa_decode_mode = "offline_instant"` for that model, or switch to Zipformer
`CUDAExecutionProvider` not found	Install CUDA-enabled sherpa-onnx, or use `sherpa_provider = "cpu"`

Audio and recognition issues

Problem	Fix
Wrong microphone	Run `shuvoice audio list-devices`, then set `audio_device` by name (not index) in config
Mic too quiet	Increase `input_gain` (try `1.3` to `1.8`)
Phantom text on silent presses	Raise `silence_rms_threshold` (try `0.010`→`0.015`) or `silence_rms_multiplier` (try `2.0`)
Long phrases cut out	Keep `streaming_stall_guard = true` (default); tune `streaming_stall_chunks` (3–6)
Clipboard pollution	Set `typing_final_injection_mode = "auto"` (detects clipboard managers automatically)
Need casual/lowercase chat output	Set `typing_text_case = "lowercase"`

System/runtime errors

Error	Fix
`libgtk4-layer-shell.so not found`	`sudo pacman -S gtk4-layer-shell`
`wtype not found in PATH`	`sudo pacman -S wtype`
`Control socket not found`	Start ShuVoice first before sending control commands
`espeak-ng not found`	`sudo pacman -S espeak-ng` (only needed for roundtrip test scripts)
`tts_speak` says no selected text	Highlight text first; verify `wl-paste` works
ElevenLabs/OpenAI 401 errors	Export the provider API key in the env var named by `tts_api_key_env`; run `shuvoice preflight`
`Failed to build kaldialign` (Python 3.14)	Use `uv sync --extra asr-nemo --override packaging/constraints/py314-overrides.txt`

Development

# Clone and install with dev tooling
git clone https://github.com/shuv1337/shuvoice.git
cd shuvoice
uv sync --dev

# Run checks
uv run ruff check shuvoice tests
uv run ruff format --check shuvoice tests
uv run pytest -m "not gui" -v

See CONTRIBUTING.md for full guidelines.

Advanced test suites

# IPC end-to-end smoke tests
pytest -m e2e -k ipc_smoke -v

# Manual phrase regression (opt-in)
SHUVOICE_RUN_ROUNDTRIP=1 \
SHUVOICE_ROUNDTRIP_BACKEND=nemo \
SHUVOICE_ROUNDTRIP_DEVICE=cuda \
pytest -m integration -k roundtrip_regression -v

# Long-phrase round-trip harness (TTS → STT)
python scripts/tts_roundtrip.py --asr-backend nemo --device cuda

# Smoke test script
./scripts/smoke-test.sh

Project Links


Repository	github.com/shuv1337/shuvoice
AUR Package	shuvoice-git
Contributing	CONTRIBUTING.md
Code of Conduct	CODE_OF_CONDUCT.md
Security	SECURITY.md
Brand Assets	docs/BRANDING.md

License

ShuVoice is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 198 Commits
.factory		.factory
.github/workflows		.github/workflows
docs		docs
examples		examples
packaging		packaging
scripts		scripts
shuvoice		shuvoice
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PLAN-37-local-tts-wizard-followups.md		PLAN-37-local-tts-wizard-followups.md
PR_AUDIT.md		PR_AUDIT.md
README.md		README.md
SECURITY.md		SECURITY.md
SPRINT-POLISH-STABILITY-PERFORMANCE.md		SPRINT-POLISH-STABILITY-PERFORMANCE.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

ShuVoice

Features

Quick Start

Installation

Option A: AUR Package (recommended)

Option B: Manual Installation (from source)

Prerequisites

1. Install system dependencies (Arch Linux)

2. Clone and set up the project

3. Install your chosen ASR backend

4. Optional: install TTS extras

First Run

Setup Wizard (recommended)

Alternative: CLI Setup

Preflight Check

Starting ShuVoice

As a systemd user service (recommended)

Manual launch (foreground)

Usage

Push-to-Talk (STT)

Hyprland Keybinds

Text-to-Speech (TTS)

Control Commands

Useful CLI Commands

Configuration

Switching ASR Backends

Backend Comparison

Sherpa-ONNX

NeMo

Moonshine

Text Injection Mode

Text Replacements

Instant Mode

Overlay Appearance

TTS Configuration

Example Configs

Waybar Integration

Troubleshooting

Development

Project Links

License

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages