Vext is a local-first desktop speech-to-text utility built with Tauri, React, TypeScript, and Rust. It records microphone input, transcribes locally with whisper.cpp, applies user-defined dictionary replacements, and inserts the final text back into the previously targeted app using a Windows clipboard-preserving paste fallback.
- Frontend
- React + TypeScript shell with Overview, History, Dictionary, Settings, and Setup pages
- Browser-side WAV microphone capture
- Local persistence in
localStorage
- Native backend
- Tauri command layer in Rust
whisper.cppprocess execution- Foreground window capture and clipboard-based text insertion on Windows
- Core modules
src/lib/audio.ts: browser WAV capturesrc/lib/dictionary.ts: replacement engine and punctuation cleanupsrc/lib/stats.ts: aggregate statssrc/lib/storage.ts: local storage managementsrc-tauri/src/services/whisper.rs: local transcription viawhisper-clisrc-tauri/src/services/insertion.rs: foreground target capture and insertion
- Hold-to-talk and toggle recording modes
- Sidebar-driven polished desktop UI
- Floating recording bar with status updates
- Local dictionary replacements with JSON export
- Searchable local history
- Privacy controls and delete/export local data actions
- Onboarding and setup checklist
- Basic tests for dictionary replacement, stats, and storage
- Node.js 20+
- Rust stable
- Microsoft Visual Studio Build Tools for Tauri on Windows
- WebView2 Runtime on Windows
npm.cmd installnpm.cmd run tauri:devnpm.cmd testnpm.cmd run tauri:buildExpected build outputs:
src-tauri\target\release\Vext.exesrc-tauri\target\release\bundle\nsis\Vext_0.1.0_x64-setup.exe
- Clone the repository:
git clone https://github.com/ggerganov/whisper.cpp.git
cd whisper.cpp- Build the CLI:
cmake -B build
cmake --build build --config Release- Use the built CLI path in Vext settings, typically:
<whisper.cpp>\build\bin\Release\whisper-cli.exe
- Download a model from the
whisper.cppmodel scripts or release assets - Typical useful models:
ggml-tiny.en.binggml-base.en.binggml-small.en.binggml-medium.en.binggml-large-v3.bin
- Store the model on local disk and point Vext settings to the exact file path
Example model download from the whisper.cpp repo:
.\models\download-ggml-model.cmd base.en- Microphone permission must be enabled in Windows Privacy settings
- Clipboard-based insertion works best when Vext and the target app are running at the same privilege level
- Some elevated apps, secure input fields, and sandboxed apps may reject pasted text or foreground restoration
The current insertion implementation is Windows-focused. A future port will need microphone and Accessibility permissions.
The current insertion implementation is Windows-focused. Browser-side audio capture still works in preview mode.
- Dictionary replacement behavior
- Stats calculation
- History/settings local storage
global hotkey, separate always-on-top recording bar window, and start-on-login are represented in the UI/settings model but are not fully wired to native OS hooks yet.- The Windows insertion path currently uses clipboard-preserving paste as the primary reliable fallback.




