Skip to content

shuv1337/shuvoice

ShuVoice

ShuVoice logo

Push-to-talk speech-to-text for Hyprland/Wayland.
Hold a key, speak, release — your words are typed into the focused window.

CI AUR License: MIT

ShuVoice splash overlay on Hyprland

Features

  • Push-to-talk dictation — hold a key, speak, release. Text appears in your focused app.
  • Pluggable ASR backends — choose between NeMo (highest accuracy), Sherpa-ONNX (fast, CPU-friendly), or Moonshine (lightweight).
  • Text-to-speech — highlight text and hear it read aloud via ElevenLabs, OpenAI, local Piper, MeloTTS, or a self-hosted Kokoro instance.
  • Native Wayland overlay — GTK4 layer-shell overlay with blur, transparency, and live transcription feedback.
  • Waybar integration — tray-style status icon with tooltip, state colors, and click actions.
  • Guided setup wizard — compact GTK onboarding flow with separate ASR, keybind/typing, and TTS steps plus model setup.
  • Zero root access — runs entirely in userspace via Hyprland IPC; no /dev/input needed.

Quick Start

The fastest path from zero to working dictation:

# 1. Install from AUR (Arch Linux)
yay -S shuvoice-git

# 2. Run the setup wizard (picks backend, keybind, downloads model)
shuvoice wizard

# 3. Enable and start the background service
systemctl --user enable --now shuvoice.service

# 4. Use it! Hold your push-to-talk key, speak, release.

That's it. The wizard handles backend selection, keybind + final text injection choices, TTS setup, model downloads, and Hyprland keybind setup.

Not on Arch? See Manual Installation below.


Installation

Option A: AUR Package (recommended)

ShuVoice is available on the AUR as shuvoice-git:

# Using yay
yay -S shuvoice-git

# Or using paru
paru -S shuvoice-git

The AUR package includes Sherpa-ONNX runtime support out of the box. If your AUR helper asks which provider to use for python-sherpa-onnx, pick python-sherpa-onnx-bin (recommended, prebuilt):

yay -S --needed python-sherpa-onnx-bin shuvoice-git

NeMo and Moonshine backends are optional and can be installed separately if needed.

Option B: Manual Installation (from source)

Expand for manual/venv setup instructions

Prerequisites

Component Requirement
OS Linux with Wayland (Hyprland recommended)
Python 3.10 or newer
Package manager uv (recommended)
GPU Optional (recommended for NeMo / Sherpa CUDA)

1. Install system dependencies (Arch Linux)

sudo pacman -S \
  gtk4 gtk4-layer-shell python-gobject \
  portaudio pipewire pipewire-audio pipewire-alsa \
  wtype wl-clipboard espeak-ng

2. Clone and set up the project

git clone https://github.com/shuv1337/shuvoice.git
cd shuvoice

# Create venv and install base dependencies
uv sync

3. Install your chosen ASR backend

Pick one (or more) backend to install:

# Sherpa-ONNX (fast, CPU-friendly by default; CUDA-capable runtime auto-repaired by setup/wizard)
uv sync --extra asr-sherpa

# NeMo (highest accuracy, requires NVIDIA GPU + CUDA)
uv sync --extra asr-nemo

# Moonshine (lightweight, low resource usage)
uv sync --extra asr-moonshine

Python 3.14 + NeMo note: use the override file to avoid build issues:

uv sync --extra asr-nemo --override packaging/constraints/py314-overrides.txt

4. Optional: install TTS extras

# ElevenLabs TTS (cloud, high quality)
uv sync --extra tts-elevenlabs

# OpenAI TTS (cloud)
uv sync --extra tts-openai

# Local TTS (Piper, experimental)
uv sync --extra tts-local

# MeloTTS (local, subprocess-isolated — no extra needed, uses separate venv)
# Setup: shuvoice setup --install-missing (when tts_backend = "melotts")

First Run

Setup Wizard (recommended)

The interactive wizard walks you through everything:

shuvoice wizard

Setup wizard welcome

The wizard now uses a compact 5-step flow:

  1. Welcome — quick first-run intro
  2. Select your ASR backend — Sherpa-ONNX, NeMo, or Moonshine
  3. Choose a Sherpa profile + device (if applicable) — Streaming (Zipformer) or Instant (Parakeet), then CPU or GPU
  4. Pick your push-to-talk key + final text injection mode — Right Ctrl, Insert, F9, Super+V, or custom, plus Auto / Clipboard / Direct typing
  5. Choose your TTS provider + default voice — ElevenLabs, OpenAI, Local Piper, MeloTTS, or Kokoro
    • For Local Piper — choose either automatic setup (install Piper + download a curated voice) or an existing local model path
    • For Kokoro — set the base URL for your local OpenAI-compatible Kokoro instance, fetch and pick from the live /audio/voices list or type a manual voice ID, then use Speak sample to verify playback before finishing
  6. Review summary + download model files — with progress indicator and cancel support
  7. Auto-configure Hyprland keybinds — adds bind/bindr lines if the key isn't already used

ASR backend selection

Keybind selection

Alternative: CLI Setup

If you prefer a non-interactive approach:

# Check dependencies, download models, run preflight
shuvoice setup

# Auto-install missing dependencies (when possible)
shuvoice setup --install-missing

# Local Piper: auto-install Piper + download a curated voice for the active local TTS config
shuvoice setup --install-missing --tts-local-voice en_US-amy-medium --non-interactive

# Local Piper: download curated voice assets into a custom managed directory
shuvoice setup --install-missing \
  --tts-local-voice en_US-lessac-high \
  --tts-local-model-dir ~/.local/share/shuvoice/models/piper \
  --non-interactive

# Quick dependency check only (skip model download + preflight)
shuvoice setup --skip-model-download --skip-preflight

Preflight Check

Verify everything is ready before first launch:

shuvoice preflight

This checks Python version, required modules, audio devices, ASR/TTS backend dependencies, required binaries (wtype, wl-copy, wl-paste), and GTK layer-shell availability.

On CUDA hosts, shuvoice setup --install-missing now also attempts to build a gpu-enabled Sherpa runtime in the venv and wire the required CUDA compat libs when sherpa_provider = "cuda" is selected.


Starting ShuVoice

As a systemd user service (recommended)

# Enable and start (will auto-start on login)
systemctl --user enable --now shuvoice.service

# Check status
systemctl --user status shuvoice.service

# View logs
journalctl --user -u shuvoice.service -f
Setting up the service for venv/source installs

The default service unit expects /usr/bin/shuvoice (AUR install path). For venv workflows, create an override:

# Copy the unit file
mkdir -p ~/.config/systemd/user
cp packaging/systemd/user/shuvoice.service ~/.config/systemd/user/

# Override the ExecStart for your venv
systemctl --user edit shuvoice.service

Add this override:

[Service]
ExecStart=
ExecStart=%h/repos/shuvoice/.venv/bin/shuvoice

Then reload and start:

systemctl --user daemon-reload
systemctl --user import-environment WAYLAND_DISPLAY DISPLAY XDG_RUNTIME_DIR HYPRLAND_INSTANCE_SIGNATURE DBUS_SESSION_BUS_ADDRESS XDG_CURRENT_DESKTOP XDG_SESSION_TYPE
systemctl --user enable --now shuvoice.service

Manual launch (foreground)

# AUR install
shuvoice

# Source/venv install
shuvoice run

Usage

Push-to-Talk (STT)

Hold your configured push-to-talk key, speak, and release. The transcribed text is typed into your focused window.

Hyprland Keybinds

The wizard configures these automatically, but you can set them manually in ~/.config/hypr/hyprland.conf:

# Push-to-talk (STT) — example using Right Ctrl
bind = , Control_R, exec, shuvoice control start --control-wait-sec 0
bindr = , Control_R, exec, shuvoice control stop --control-wait-sec 0
bindr = CTRL, Control_R, exec, shuvoice control stop --control-wait-sec 0

# Speak selected text (TTS)
bind = SUPER CTRL, S, exec, shuvoice control tts_speak --control-wait-sec 0

Text-to-Speech (TTS)

Highlight text in any app, then press your TTS keybind (default: Super+Ctrl+S). ShuVoice reads the selected text aloud using your configured TTS backend.

  • Uses primary selection first, then clipboard fallback
  • STT and TTS are mutually exclusive (starting one stops the other)
  • Re-triggering while speaking interrupts and starts fresh
  • The TTS overlay includes pause/resume, restart, stop, voice selection, and provider-backed speed controls
  • Changing speed while speaking restarts the current utterance from the beginning at the new synthesis speed

Control Commands

Send commands to a running ShuVoice instance:

shuvoice control start       # Begin recording
shuvoice control stop        # Stop recording and type result
shuvoice control toggle      # Toggle recording on/off
shuvoice control status      # Show current state
shuvoice control metrics     # Show runtime metrics

# TTS commands
shuvoice control tts_speak   # Read selected text aloud
shuvoice control tts_stop    # Stop TTS playback
shuvoice control tts_status  # Show TTS state

Useful CLI Commands

shuvoice --help              # Show all commands
shuvoice audio list-devices  # List audio input devices
shuvoice config effective    # Print merged effective config
shuvoice config validate     # Validate your config file
shuvoice model download      # Download model for active backend
shuvoice diagnostics         # Show runtime diagnostics
shuvoice wizard              # Re-run the setup wizard

Configuration

Config file location: ~/.config/shuvoice/config.toml

The wizard creates this for you. To edit manually, see the full reference at examples/config.toml.

Switching ASR Backends

[asr]
asr_backend = "sherpa"     # sherpa | nemo | moonshine

Restart the service after changing:

systemctl --user restart shuvoice.service

Backend Comparison

Backend Best For GPU Required? Accuracy Speed
Sherpa-ONNX General use, CPU systems No (optional CUDA) Good Fast
NeMo Maximum accuracy Yes (CUDA) Best Medium
Moonshine Low-resource systems No (optional CUDA) Fair Varies
Detailed backend configuration

Sherpa-ONNX

Default backend. Two profiles available:

  • Streaming (Zipformer) — live partial transcription as you speak
  • Instant (Parakeet) — transcribes on key release for higher accuracy
[asr]
asr_backend = "sherpa"
sherpa_provider = "cpu"              # cpu | cuda
sherpa_model_name = "sherpa-onnx-streaming-zipformer-en-kroko-2025-08-06"
sherpa_decode_mode = "auto"          # auto | streaming | offline_instant

Parakeet offline instant mode (recommended for Parakeet models):

[asr]
asr_backend = "sherpa"
sherpa_model_name = "sherpa-onnx-nemo-parakeet-tdt-0.6b-v3-int8"
instant_mode = true
sherpa_decode_mode = "offline_instant"

Models auto-download to ~/.local/share/shuvoice/models/sherpa/ when sherpa_model_dir is unset.

NeMo

Highest accuracy. Requires NVIDIA GPU with CUDA.

[asr]
asr_backend = "nemo"
model_name = "nvidia/nemotron-speech-streaming-en-0.6b"
device = "cuda"                      # cuda | cpu
right_context = 13                   # 0-13; higher = more accurate, more latency

Moonshine

Lightweight, CPU-friendly. Best for short utterances.

[asr]
asr_backend = "moonshine"
moonshine_model_name = "moonshine/tiny"    # moonshine/tiny (fast) | moonshine/base (slower)
moonshine_provider = "cpu"                 # cpu | cuda
moonshine_max_window_sec = 5.0

Text Injection Mode

Controls how transcribed text is typed into your focused app:

[typing]
typing_final_injection_mode = "auto"   # auto | clipboard | direct
typing_text_case = "default"           # default | lowercase
Setting Values Behavior
typing_final_injection_mode auto, clipboard, direct Controls how final text is injected into the focused app
typing_text_case default, lowercase Controls whether final text keeps normal capitalization/punctuation style or is forced to lowercase
Injection mode Behavior
auto (default) Detects clipboard watchers and chooses the safest method automatically
clipboard Copies text to clipboard and pastes with Ctrl+V
direct Types text directly via wtype (avoids clipboard)
Text case Behavior
default (default) Keeps ShuVoice's normal punctuation/capitalization processing
lowercase Forces final committed text to lowercase for informal chat/conversation

Quick toggles from CLI:

shuvoice config set typing_final_injection_mode direct
shuvoice config set typing_text_case lowercase

Text Replacements

Add custom word/phrase corrections applied to transcribed text:

[typing.text_replacements]
"speech to text" = "speech-to-text"
"um" = ""                              # empty value deletes the word

Matches are case-insensitive, whole-word/phrase only (longest first). ShuVoice includes built-in corrections for common ASR variants of "ShuVoice" and "Hyprland".

Instant Mode

Set instant_mode = true for low-latency tuning across all backends:

[asr]
instant_mode = true

Effects per backend:

  • NeMo: forces right_context = 0
  • Sherpa streaming: caps sherpa_chunk_ms at 80
  • Sherpa offline: enables one-shot release-to-final decode
  • Moonshine: forces moonshine/tiny, caps window to 3s and tokens to 48

Overlay Appearance

[overlay]
font_size = 24
font_family = "JetBrains Mono"   # optional
bg_opacity = 0.55

Hyprland blur/transparency for the overlay:

# In hyprland.conf — ShuVoice uses layer-shell namespaces: stt-overlay, tts-overlay
layerrule = blur, stt-overlay
layerrule = ignorealpha 0.20, stt-overlay
layerrule = xray 1, stt-overlay

layerrule = blur, tts-overlay
layerrule = ignorealpha 0.20, tts-overlay

TTS Configuration

[tts]
tts_enabled = true
tts_backend = "elevenlabs"                      # elevenlabs | openai | local | melotts | kokoro
tts_default_voice_id = "zNsotODqUhvbJ5wMG7Ei"   # ElevenLabs default
# OpenAI defaults are auto-applied when tts_backend = "openai":
# tts_default_voice_id = "onyx"
# tts_model_id = "gpt-4o-mini-tts"
# tts_api_key_env = "OPENAI_API_KEY"
# Local Piper defaults are auto-applied when tts_backend = "local":
# tts_default_voice_id = "default"              # use first discovered local .onnx model
# tts_model_id = "piper"
# tts_local_model_path = "~/.local/share/shuvoice/models/piper"   # wizard/setup managed directory
# tts_local_voice = "en_US-amy-medium"          # optional explicit curated model stem
# Kokoro defaults are auto-applied when tts_backend = "kokoro":
# tts_default_voice_id = "af_heart"
# tts_model_id = "kokoro"
# tts_kokoro_base_url = "http://localhost:8880/v1"
tts_model_id = "eleven_flash_v2_5"
tts_api_key_env = "ELEVENLABS_API_KEY"          # env var name (not the key itself)
tts_playback_speed = 1.0                         # default synthesis speed (0.5x to 2.0x)

Set your API key in ~/.config/shuvoice/local.dev:

# ElevenLabs
ELEVENLABS_API_KEY=sk-your-key-here

# OpenAI
OPENAI_API_KEY=sk-your-key-here

For Local Piper, you now have two setup paths:

  • Automatic setup — use the wizard's Local Piper automatic mode, or run shuvoice setup --install-missing with tts_backend = "local". ShuVoice will:
    • install piper-tts on Arch when possible,
    • download a curated voice into ~/.local/share/shuvoice/models/piper/,
    • persist tts_local_model_path / tts_local_voice, and
    • validate the resulting setup.
  • Manual setup — set tts_local_model_path to a .onnx file or a directory of .onnx voices.

If you point at a directory and leave tts_local_voice unset, ShuVoice uses the first discovered model automatically. Piper sidecar files (.onnx.json) are also used to detect the correct playback sample rate.

For MeloTTS, no API key is required — it runs fully locally using subprocess isolation (separate Python 3.12 venv) to avoid dependency conflicts. Five English voices are available: American (EN-US), British (EN-BR), Indian (EN-INDIA), Australian (EN-AU), and Newest (EN-Newest). CPU real-time inference is supported, with optional CUDA acceleration.

[tts]
tts_enabled = true
tts_backend = "melotts"
tts_default_voice_id = "EN-US"                   # EN-US | EN-BR | EN-INDIA | EN-AU | EN-Newest
tts_playback_speed = 1.0
# tts_melotts_device = "auto"                    # auto | cpu | cuda
# tts_melotts_venv_path = "~/.local/share/shuvoice/melotts-venv"  # default managed location

Setup is automated: run shuvoice setup --install-missing with tts_backend = "melotts", or select MeloTTS in the setup wizard. ShuVoice handles venv creation and model download.

For Kokoro, no install step is required inside ShuVoice itself — point the wizard at your OpenAI-compatible Kokoro base URL, choose a voice ID, and keep tts_backend = "kokoro". shuvoice setup and shuvoice preflight will verify connectivity by querying the voice list.

[tts]
tts_enabled = true
tts_backend = "kokoro"
tts_default_voice_id = "af_heart"
tts_model_id = "kokoro"
tts_kokoro_base_url = "http://localhost:8880/v1"
tts_playback_speed = 1.0

Example Configs

Pre-built config files for common setups:

File Description
examples/config.toml Full reference with all options
examples/config-sherpa-cpu.toml Sherpa on CPU
examples/config-sherpa-cuda.toml Sherpa on GPU
examples/config-sherpa-parakeet-offline.toml Parakeet instant mode
examples/config-nemo-cuda.toml NeMo on GPU
examples/config-nemo-cpu.toml NeMo on CPU
examples/config-moonshine-cpu.toml Moonshine on CPU

Waybar Integration

Waybar tooltip

ShuVoice ships a Waybar helper (shuvoice-waybar) for a tray-style status icon. When running on Hyprland, the tooltip also shows the configured TTS provider / voice, default synthesis speed, plus the detected push-to-talk and TTS keybinds.

Add to your Waybar config:

"custom/shuvoice": {
  "return-type": "json",
  "exec": "shuvoice-waybar status",
  "interval": 1,
  "on-click": "shuvoice-waybar toggle-record",
  "on-click-middle": "shuvoice-waybar service-toggle",
  "on-click-right": "shuvoice-waybar menu",
  "tooltip": true
}

State-based CSS styling:

#custom-shuvoice.recording  { color: #f38ba8; }
#custom-shuvoice.processing { color: #fab387; }
#custom-shuvoice.idle       { color: #a6e3a1; }
#custom-shuvoice.starting   { color: #f9e2af; }
#custom-shuvoice.stopped    { color: #7f849c; }
#custom-shuvoice.error      { color: #f38ba8; }
Waybar setup details

Available commands:

shuvoice-waybar status          # JSON output for Waybar
shuvoice-waybar menu            # Right-click menu
shuvoice-waybar toggle-record   # Toggle recording
shuvoice-waybar service-toggle  # Start/stop systemd service
shuvoice-waybar launch-wizard   # Open setup wizard

Right-click menu uses one of: omarchy-launch-walker, walker, wofi, rofi, bemenu, or dmenu.

If Waybar can't find shuvoice-waybar in PATH, use the wrapper script:

# Install a PATH symlink (default: ~/.local/bin/shuvoice-waybar)
./scripts/install-waybar-wrapper.sh

# Or point Waybar to the full path
# exec: "/home/you/.venv/bin/shuvoice-waybar status"

See examples/waybar-custom-shuvoice.jsonc and examples/waybar-shuvoice.css for complete examples.


Troubleshooting

Missing Python modules
Error Fix
No module named 'torch' or 'nemo' uv sync --extra asr-nemo or install python-pytorch-cuda (Arch)
No module named 'sherpa_onnx' AUR: yay -S python-sherpa-onnx-bin · venv: uv sync --extra asr-sherpa
No module named 'moonshine_onnx' uv sync --extra asr-moonshine
No module named 'gi' sudo pacman -S python-gobject gtk4 gtk4-layer-shell
Sherpa / Parakeet errors
Error Fix
Missing encoder/decoder/joiner artifacts Point sherpa_model_dir to a valid model directory, or unset it to auto-download
Parakeet requires offline instant mode Use sherpa_decode_mode = "offline_instant" with instant_mode = true
window_size does not exist in the metadata Use sherpa_decode_mode = "offline_instant" for that model, or switch to Zipformer
CUDAExecutionProvider not found Install CUDA-enabled sherpa-onnx, or use sherpa_provider = "cpu"
Audio and recognition issues
Problem Fix
Wrong microphone Run shuvoice audio list-devices, then set audio_device by name (not index) in config
Mic too quiet Increase input_gain (try 1.3 to 1.8)
Phantom text on silent presses Raise silence_rms_threshold (try 0.0100.015) or silence_rms_multiplier (try 2.0)
Long phrases cut out Keep streaming_stall_guard = true (default); tune streaming_stall_chunks (3–6)
Clipboard pollution Set typing_final_injection_mode = "auto" (detects clipboard managers automatically)
Need casual/lowercase chat output Set typing_text_case = "lowercase"
System/runtime errors
Error Fix
libgtk4-layer-shell.so not found sudo pacman -S gtk4-layer-shell
wtype not found in PATH sudo pacman -S wtype
Control socket not found Start ShuVoice first before sending control commands
espeak-ng not found sudo pacman -S espeak-ng (only needed for roundtrip test scripts)
tts_speak says no selected text Highlight text first; verify wl-paste works
ElevenLabs/OpenAI 401 errors Export the provider API key in the env var named by tts_api_key_env; run shuvoice preflight
Failed to build kaldialign (Python 3.14) Use uv sync --extra asr-nemo --override packaging/constraints/py314-overrides.txt

Development

# Clone and install with dev tooling
git clone https://github.com/shuv1337/shuvoice.git
cd shuvoice
uv sync --dev

# Run checks
uv run ruff check shuvoice tests
uv run ruff format --check shuvoice tests
uv run pytest -m "not gui" -v

See CONTRIBUTING.md for full guidelines.

Advanced test suites
# IPC end-to-end smoke tests
pytest -m e2e -k ipc_smoke -v

# Manual phrase regression (opt-in)
SHUVOICE_RUN_ROUNDTRIP=1 \
SHUVOICE_ROUNDTRIP_BACKEND=nemo \
SHUVOICE_ROUNDTRIP_DEVICE=cuda \
pytest -m integration -k roundtrip_regression -v

# Long-phrase round-trip harness (TTS → STT)
python scripts/tts_roundtrip.py --asr-backend nemo --device cuda

# Smoke test script
./scripts/smoke-test.sh

Project Links

Repository github.com/shuv1337/shuvoice
AUR Package shuvoice-git
Contributing CONTRIBUTING.md
Code of Conduct CODE_OF_CONDUCT.md
Security SECURITY.md
Brand Assets docs/BRANDING.md

License

ShuVoice is released under the MIT License.

About

Real-time push-to-talk speech-to-text for Hyprland/Wayland — streaming ASR overlay with pluggable backends (NeMo, Sherpa ONNX, Moonshine), TTS playback, and instant transcription typed into any window

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors