Push-to-talk voice-to-text for Windows on ARM (Surface Pro / Snapdragon X Elite). Local transcription via NVIDIA Parakeet TDT 0.6B v3 running on the Hexagon NPU. 25 languages, ~67 ms per 8-second window on the NPU. Optional LLM cleanup via GitHub Copilot (Claude Haiku 4.5, GPT-5 Mini, GPT-4.1) or any OpenAI-compatible endpoint.
Microsoft Store · Website · Releases · NPU model on HF · macOS sibling
Easiest: get it from the Microsoft Store. One click, it installs the right build for your CPU automatically, keeps itself updated, and there's no SmartScreen warning.
Prefer a direct download? Pick the build for your CPU, both per-user (no admin / UAC required):
| Your machine | Build | Engine |
|---|---|---|
| Snapdragon X (Surface Pro 11, etc.) | …-arm64-… |
Hexagon NPU + CPU fallback |
| Intel / AMD laptop | …-x64-… |
CPU INT8 |
Not sure? Snapdragon laptops report "ARM-based processor" in Settings → System → About. Everything else is x64.
Installer. Download
openwritr-windows-<arch>-vX.Y.Z-setup.exe from
Releases
and run it. Sets up the Start Menu shortcut, an optional autostart-at-logon
entry, and a proper uninstaller you'll find under Settings → Apps.
Portable zip. Download openwritr-windows-<arch>-vX.Y.Z.zip and unzip
it into any folder you like (e.g. C:\Tools\OpenWritr\), then run
openwritr.exe. The app finds its DLLs next to the exe — the install
location doesn't matter. Same binaries as the installer, just no shortcuts
and no autostart.
Note: user data (settings, downloaded models, logs) always lives under
%LOCALAPPDATA%\OpenWritr\— the app creates that folder automatically on first run, you never need to create it yourself.AppDatais a hidden folder; if you want to look inside, paste%LOCALAPPDATA%\OpenWritrinto the Explorer address bar.
The x64 build runs Parakeet on the CPU (no Hexagon NPU on Intel/AMD); the arm64 build adds the NPU engine. Both share the same multilingual model and UX.
Windows SmartScreen warning. The binaries are not code-signed (yet), so the first launch shows "Windows protected your PC". Click More info → Run anyway. To verify your download is authentic, compare its SHA-256 against
SHA256SUMS.txtattached to the release:Get-FileHash .\openwritr-windows-<arch>-vX.Y.Z-setup.exein PowerShell.
On first launch the Parakeet model is fetched from Hugging Face into
%LOCALAPPDATA%\OpenWritr\models\ — one-time, ~1.2 GB on the NPU engine
(600 MB CPU INT8 + 632 MB QNN HTP context binary), ~2 minutes on a fast
link. A microphone icon appears in your system tray when the engine is
ready.
| Combo | Action |
|---|---|
| Hold Ctrl + Win | Record. Release to transcribe and paste at the caret. |
| Hold Ctrl + Shift + Win | Record + LLM cleanup (Claude Haiku 4.5 by default). |
| Tray right-click → Settings | Change hotkey, engine, LLM provider. |
A small dark pill appears at the bottom-center of the primary monitor while recording, with white bars that breathe with your voice. Settings changes take effect immediately — no restart required.
Tray icon → right-click → Settings. All fields:
- Hotkey: any combination of Ctrl / Shift / Alt / Win modifiers, plus an
optional trigger key (Space, Tab, Caps Lock, F13-F20, or
Nonefor modifiers-only). Default: Ctrl+Win, no trigger. - Transcription engine: Parakeet CPU INT8, Parakeet NPU (default for v0.3+), Whisper Large v3 Turbo NPU. The NPU engine runs the encoder on the Snapdragon X Elite Hexagon HTP via a pre-compiled QAIRT context binary; preprocessor and TDT decoder remain on the CPU. Falls back to CPU INT8 automatically if the NPU model fails to load.
- Behaviour: auto-paste at cursor, show overlay while recording, play start/stop sounds.
- Enhance: provider (Off / GitHub Copilot / OpenAI-compatible API), model dropdown (Claude Haiku 4.5, GPT-5 Mini, GPT-4.1) or free-form custom model name, base URL + API key (OpenAI-compatible only).
Settings are stored at %LOCALAPPDATA%\OpenWritr\settings.json. The app polls
the file's mtime so external edits also take effect live.
openwritr-windows/
├── src/
│ ├── main.rs entry; dispatches `--settings` subprocess
│ ├── app.rs winit event loop, tray, hotkey thread, ASR dispatch
│ ├── audio.rs cpal WASAPI capture, multi-channel downmix
│ ├── hotkey.rs push-to-talk combo polling against key_hook state
│ ├── key_hook.rs global WH_KEYBOARD_LL hook → atomic key bitmap
│ ├── overlay.rs custom Win32 layered top-most window, GDI bars
│ ├── settings.rs serde struct + JSON load/save
│ ├── settings_ui.rs eframe/egui dialog (subprocess)
│ ├── asr/ ONNX Runtime pipeline (mel → encoder → TDT decoder)
│ │ ├── parakeet.rs Encoder enum {Cpu, Npu} + chunked long-audio pipeline
│ │ └── qnn_ffi.rs direct C-API FFI for the NPU encoder session
│ ├── enhance.rs Copilot / OpenAI cleanup pass
│ ├── sounds.rs G3/E3 tone synth (start/stop pings)
│ └── bin/package.rs distributable-zip builder
└── Cargo.toml
Key design decisions:
- Global low-level keyboard hook. Push-to-talk detection reads physical
key state from
WH_KEYBOARD_LLinstead ofGetAsyncKeyState. The OS synthesises key-ups during focus changes (PowerShell launched mid-recording, UAC prompt, system shortcut handler), which the polling API faithfully reports — and which would abort the recording. The LL hook sees only physical events. - Settings UI as subprocess. The settings dialog is the same exe re-launched
with
--settings. Spawning happens from a worker thread becauseCreateProcessWon Windows ARM64 with Defender real-time scanning can block several seconds — doing it inline would freeze the tray pump. - Overlay on its own message loop. Layered top-most window with color-key
transparency, painted with double-buffered GDI. Shares only two atomics
(
recording+last_rms_x10000) with the recorder, so it cannot deadlock the main app. - Multi-channel downmix in the audio callback. The Qualcomm Aqstic mic array on Surface Pro exposes 4-8 interleaved channels at 48 kHz; we average to mono before resampling to 16 kHz.
NPU model preparation lives in scripts/. build_npu_encoder.py,
aihub_compile_encoder.py, wrap_qnn_context_binary.py, and
test_npu_encoder.py are build-time tools used to produce the .bin
hosted on HF. They are NOT invoked by the shipped openwritr.exe at
runtime — the native build pulls the pre-compiled binary directly.
The Rust app does not call into Python at runtime.
The Rust package script (cargo run --release --bin package) stages Qualcomm
QNN runtime DLLs from pip install onnxruntime-qnn into target/release/
before zipping. The DLLs are required for the NPU engine option to work at
all (even the Python path needs them). This is the only Python touch point at
build time; the resulting zip is fully self-contained and Python-free.
py -3.11-arm64 -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install onnxruntime-qnnAfter that, build + package:
.\scripts\envup.ps1 # primes vcvars arm64 + LLVM in PATH
cargo build --release --bin openwritr
cargo run --release --bin packageThe zip lands in target/dist/.
The encoder runs as a pre-compiled QAIRT context binary on the Hexagon HTP of the Snapdragon X Elite. Preprocessor (mel features) and TDT decoder remain on the CPU EP — they are dynamic-shape, lightweight, and not a bottleneck.
The compiled binary expects a fixed 8-second audio window. For push-to-talk utterances ≤ 8 s the encoder is run once on a padded window. For longer audio the encoder is run in chunks (8 s window with 1 s overlap), feature streams are stitched at the seam, and the TDT decoder runs once over the concatenated features. Tested up to ~23 s without boundary doubling.
| Audio length | Decode (preproc + encode + TDT) | × Realtime | Chunks |
|---|---|---|---|
| 3 s | 128 ms | 23× | 1 |
| 5.8 s | 221 ms | 26× | 1 |
| 16.4 s | 375 ms | 44× | 3 |
| 23.0 s | 626 ms | 37× | 4 |
The encoder itself is ~67 ms ± 0 ms per 8-second window, independent of the actual audio length within the window.
scripts/build_npu_encoder.pyconstant-folds the encoder's dynamic-shape attention mask (Shape → Gather → Range → Expand) against a frozen[1, 128, 801]input — the HTP backend cannot evaluate that subgraph as-is.scripts/aihub_compile_encoder.pysubmits the static-shape FP32 ONNX plus FLEURS calibration samples to Qualcomm AI Hub. Quantize job uses INT8 weights / INT16 activations (the standard HTP recipe for transformer encoders); compile job targetssnapdragon_x_elite_crdwith--target_runtime qnn_context_binary --truncate_64bit_io.scripts/wrap_qnn_context_binary.pywraps the resulting.binin a 408-byte EPContext-node ONNX so ORT's QNN EP can consume it.src/asr/qnn_ffi.rsloads the wrapper via directort_sysC-API calls, bypassingort2.0-rc.12's session builders (which crash inside QnnHtp when consuming EPContext-wrapper ONNX).
The QNN backend loads several sibling DLLs by name at session-create time.
src/bin/package.rs stages all of them into the release zip, but if you
hand-assemble a distribution: alongside onnxruntime_providers_qnn.dll
you need QnnHtp.dll, QnnHtpPrepare.dll, QnnSystem.dll, the V73/V81
stubs (QnnHtpV73Stub.dll, QnnHtpV81Stub.dll), the per-arch skeletons
(libQnnHtpV73Skel.so, libQnnHtpV81Skel.so), and the catalog files
(libqnnhtpv73.cat, libqnnhtpv81.cat). Without the SKEL + .cat pair, the
stub fails LoadLibrary with ERROR_MOD_NOT_FOUND (126) and QnnHtp later
aborts session creation with STATUS_STACK_BUFFER_OVERRUN (0xC0000409)
without a useful error.
| Model | Provider | Size | License | Auto-downloaded? |
|---|---|---|---|---|
| Parakeet TDT 0.6B v3 (CPU INT8 ONNX + companion files) | istupakov/parakeet-tdt-0.6b-v3-onnx | ~670 MB | CC-BY-4.0 | Yes, on first run |
| Parakeet TDT 0.6B v3 NPU encoder (QAIRT context binary + wrapper) | trsdn/parakeet-tdt-0.6b-v3-htp-int8-8s | ~632 MB | CC-BY-4.0 | Yes, on first NPU launch |
| Whisper Large v3 Turbo (QNN context binary) | qualcomm/Whisper-Large-V3-Turbo | ~1.6 GB | Apache 2.0 + BSD-3 | Python build only |
The NPU encoder model is device-gated to Snapdragon X Elite (Hexagon V73). It will not run on X Plus or any other Qualcomm chipset without recompilation via AI Hub.
- OpenWritr code: MIT — see
LICENSE. - Parakeet model: CC-BY-4.0 (NVIDIA). Attribution preserved when the model is downloaded.
- Qualcomm QNN runtime DLLs (
QnnHtp.dll,QnnCpu.dll,Genie.dll, etc., bundled in the release zip): Qualcomm AI Engine Direct redistributable license. The full text ships inside every release zip underthird-party-licenses/Qualcomm_LICENSE.pdf, alongside Microsoft'sThirdPartyNotices.txtfor theonnxruntime-qnnPyPI package the DLLs come from. These DLLs are redistributable as part of applications targeting Qualcomm Snapdragon hardware, which is what OpenWritr does. - ONNX Runtime DLLs (
onnxruntime.dll,onnxruntime_providers_qnn.dll): MIT (Microsoft), bundled under their respective LICENSE files in the release zip.
openwritr-windows/
├── src/ Rust native app (what users run)
├── python/ Legacy v0.1 — current NPU fallback
├── .venv/ gitignored; pip install onnxruntime-qnn happens here
└── target/ gitignored; build output
.venv is build-time only. The shipped openwritr.exe does not call into
Python at runtime.
git clone https://github.com/trsdn/openwritr-windows.git
cd openwritr-windows
py -3.11-arm64 -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install onnxruntime-qnn
.\scripts\envup.ps1
cargo run --release --bin openwritrFor the Python NPU build, additionally:
pip install -r python\requirements.txt
python python\openwritr.pyTagged releases live at
github.com/trsdn/openwritr-windows/releases.
Each release ships a single zip containing openwritr.exe, the ONNX Runtime
DLLs, the QNN runtime DLLs, this README, the MIT LICENSE, and a
third-party-licenses/ folder with the Qualcomm and Microsoft licence files
for the bundled DLLs.