Skip to content

JohnAllsopp/our-voice

Repository files navigation

Our Voice

A macOS menu bar dictation tool that runs entirely on-device. Hold the Fn key to record, release to transcribe and paste into any app.

No cloud services, no subscriptions, no data leaves your Mac.

Features

  • Push-to-talk dictation via the Fn key (or configurable hotkey)
  • Two speech-to-text backends:
    • Moonshine -- lightweight, CPU-based, 5 model sizes
    • MLX Whisper -- Apple Silicon GPU-accelerated, 6 model sizes
  • Live model switching from the menu bar (no restart needed)
  • Post-processing pipeline:
    • Hallucination loop detection and removal
    • Trailing silence trimming
    • Filler word removal (um, uh, you know, etc.)
    • Vocabulary corrections via user-editable files
  • Transcription history viewer with copy support
  • Training data collection -- saves audio/transcript pairs for future fine-tuning
  • Paste Last Transcription -- re-inject the last result into any app

Requirements

  • macOS 13+ (Ventura or later)
  • Python 3.10+
  • For MLX Whisper: Apple Silicon Mac (M1/M2/M3/M4), ffmpeg (brew install ffmpeg)
  • For Moonshine: Any Mac (CPU-only, also works on Apple Silicon)

Installation

git clone https://github.com/JohnAllsopp/our-voice.git
cd our-voice
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# For MLX Whisper backend (recommended on Apple Silicon):
pip install mlx-whisper
brew install ffmpeg

Usage

# Run with defaults (MLX Whisper large-v3-turbo)
python run.py

# Run with Moonshine backend
python run.py --backend moonshine

# Run with a specific MLX Whisper model
python run.py --backend mlx-whisper --model large-v3-turbo

# Disable post-processing
python run.py --no-post-processing

Once running, look for OV in your menu bar:

  1. Hold Fn to start recording (red circle appears)
  2. Release Fn to stop and transcribe
  3. Text is automatically pasted into the focused app

Menu bar options

  • Model -- switch between Moonshine and MLX Whisper models on the fly
  • Vocabulary > Edit Corrections -- add word corrections (e.g., whisper eye -> WhisperAI)
  • Vocabulary > Edit Prompt Terms -- add domain-specific terms to improve recognition
  • View Transcriptions -- browse all past transcriptions
  • Paste Last Transcription -- re-paste the most recent result

Available models

Moonshine

Model Type Notes
tiny Non-streaming Smallest, fastest
base Non-streaming Larger, better accuracy
tiny-streaming Streaming arch Can also run in batch mode
small-streaming Streaming arch Mid-size
medium-streaming Streaming arch Best accuracy (default)

MLX Whisper

Model Size Notes
tiny ~75 MB Fastest, lower accuracy
base ~140 MB Good for quick tasks
small ~460 MB Good accuracy
medium ~1.5 GB High accuracy
large-v3 ~3 GB Highest accuracy, slower
large-v3-turbo ~1.6 GB Near-large accuracy, much faster (default)

Models are downloaded automatically on first use.

Permissions

Our Voice needs two macOS permissions:

  • Accessibility -- to paste text into the focused application
  • Microphone -- to record audio

The app will prompt for these on first run, or you can check them from the menu bar: Check Permissions.

Architecture

run.py                  -- Entry point and CLI
our_voice/
  app.py                -- Menu bar app (rumps), UI, queue consumer
  config.py             -- All configuration constants
  transcription.py      -- Recording, transcription pipeline, post-processing
  hotkey.py             -- Fn key listener (Quartz CGEventTap)
  text_injector.py      -- Paste text via clipboard + Cmd+V
  permissions.py        -- macOS permission checks
  vocabulary.py         -- Initial prompt and user vocabulary
  post_processing.py    -- Regex-based corrections
  training_data.py      -- Audio/transcript pair saving
  backends/
    base.py             -- Abstract TranscriptionBackend interface
    mlx_whisper.py      -- MLX Whisper backend
    moonshine.py        -- Moonshine Voice backend

User data

All user data is stored in ~/.our-voice/:

  • transcriptions.jsonl -- transcription log
  • corrections.txt -- user word corrections
  • prompt_terms.txt -- custom vocabulary terms
  • training_data/ -- saved audio/transcript pairs

License

MIT -- see LICENSE.

About

macOS dictation tool powered by Moonshine Voice — on-device speech-to-text

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors