Desktop app for recording, transcribing, and searching meetings — 100% local, GPU-accelerated, with speaker diarization. Built with Tauri + React + faster-whisper + SpeechBrain.
No cloud. No API keys. Your audio and transcripts never leave your machine.
- Record meetings with mic + system audio (loopback) captured together
- Call detection — suggests "Record now" when Teams / Zoom / Meet are active
- GPU-accelerated transcription (faster-whisper + CTranslate2, int8 on NVIDIA) — typical RTF ~0.03x (30s of audio → 1s on GTX 1080 Ti)
- Speaker diarization (SpeechBrain ECAPA-TDNN + agglomerative clustering) — fully local, no HuggingFace token required
- Full-text search across all transcriptions (SQLite FTS5)
- Multiple Whisper models — tiny / base / small / medium / large-v2
| Layer | Tech |
|---|---|
| UI | React 19 + TypeScript + Vite |
| Shell | Tauri 2 (Rust) |
| Backend | Python (Click CLI, JSON IPC) |
| Storage | SQLite (FTS5 for search) |
| ASR | faster-whisper / CTranslate2 |
| Diarization | SpeechBrain ECAPA-TDNN + scikit-learn |
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ React UI │────▶│ Tauri (Rust)│────▶│ Python CLI │
│ (Vite dev) │ │ commands │ │ (Click) │
└─────────────┘ └─────────────┘ └──────┬──────┘
▼
┌───────────┐
│ SQLite │
│ recordings│
│ models │
└───────────┘
Every Tauri command invokes the Python CLI as a subprocess and parses its JSON stdout. The recording process spawns detached and is controlled via a lock file.
- Windows 10 / 11 (primary target; Linux/macOS untested)
- Python 3.10+
- Node 18+
- Rust (for Tauri) — install via rustup
- NVIDIA GPU with CUDA 12 support (optional but strongly recommended)
:: 1. Install Python dependencies
cd python
pip install -r requirements.txt
:: Install PyTorch with CUDA support separately (not on PyPI)
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu124
cd ..
:: 2. Install Node dependencies
npm install
:: 3. Run in development
npm run tauri devOn first launch, Whisper models (~500 MB – 3 GB depending on size) and the
SpeechBrain ECAPA-TDNN model (~80 MB) download into data/models/.
build_app.batThis runs PyInstaller to bundle the Python CLI into a single .exe sidecar,
then builds the Tauri installer (.msi + .exe) in
src-tauri/target/release/bundle/.
AiNotes/
├── src/ React + TypeScript UI
├── src-tauri/ Rust / Tauri shell
│ ├── src/lib.rs Command handlers
│ └── binaries/ PyInstaller sidecar (generated)
├── python/
│ └── ainotes/ Python backend
│ ├── cli.py Click CLI (all JSON output)
│ ├── db.py SQLite + FTS5
│ ├── recorder.py WASAPI loopback + mic capture
│ ├── transcriber.py faster-whisper pipeline
│ ├── diarizer.py SpeechBrain ECAPA-TDNN
│ └── call_detector.py pycaw-based call detection
└── data/ Runtime data (gitignored)
├── recordings/
├── models/
└── ainotes.db
On an i7-7700K + GTX 1080 Ti (Pascal, CC 6.1):
| Audio length | Model | Device | Time | RTF |
|---|---|---|---|---|
| 71s | small | cuda int8 | ~2s | 0.025x |
| 71s | small | cpu int8 | ~30s | 0.4x |
Diarization runs on CPU (to avoid a cuDNN conflict with CTranslate2) and adds ~1-3 minutes for a one-hour meeting.
faster-whisper (via CTranslate2) loads cuDNN 9 DLLs from the nvidia-* pip
packages. PyTorch 2.6 ships its own cuDNN, and when both sit in the same
process they compete for symbols, crashing with exit code 127. Since
ECAPA-TDNN is small and fast on CPU, running diarization there is the
simplest robust fix.
MIT — see LICENSE file.