GitHub - Arsh-Pathan/ContextFlow: Eliminate the keyboard bottleneck with ContextFlow: a premium, offline-first Windows dictation orchestrator. Powered by local AI, it lets you hold a hotkey to instantly transcribe and type clean text directly into any browser or IDE

Windows-native AI voice dictation — press a hotkey, speak naturally, and have polished text appear in any application. Fully offline by default, sub-300ms latency, everywhere on Windows.

Why ContextFlow?

Typing is a bottleneck. Voice is faster, but existing solutions fall short:

Windows built-in dictation is slow, only works in select text fields, can't clean filler words, and doesn't understand natural corrections. Cloud dictation services send your audio to remote servers, add network latency, and stop working offline.

ContextFlow is built differently:

Universal injection — types into every Windows text field: browsers, VS Code, Slack, terminals, Win32 dialogs, Office. If a caret blinks, ContextFlow can fill it.
Offline-first — the default pipeline (whisper.cpp) runs entirely on your machine. No audio ever leaves your computer unless you explicitly opt into a cloud provider.
Low latency — speech recognition is streamed with <300ms visible lag. The pipeline is tuned for speed from capture to text.
AI-native — the AI cleanup layer understands spoken edits ("meet at 2, actually 3" → "meet at 3"), removes filler words, and adds punctuation automatically.

Features

Area	Capability
Capture	WASAPI loopback via cpal, 16 kHz resampling via rubato, voice activity detection via webrtc-vad, lock-free ring buffer
Speech	Pluggable providers via `SpeechProvider` trait — whisper.cpp (CUDA-accelerated), Windows SR, faster-whisper, Deepgram, OpenAI Realtime
Injection	Layered strategy: UI Automation → SendInput → clipboard. Falls through automatically. Per-app routing planned.
AI	Cleanup pipeline for punctuation, filler-word removal, spoken-correction resolution. Pluggable provider: Built-in (on-device, default), OpenAI, Anthropic, Gemini, or Ollama — opt-in.
Hotkey	Global hotkey (Ctrl+Space) with cross-app lifecycle, keyboard hook for reliable release detection
UI	Floating bubble with audio-reactive visualizer, state-driven animations, transparent always-on-top overlay
Settings	WhisperFlow-style preferences window (tray → Settings…): General, Appearance, AI Provider, Features, About. Live, cross-window synced.
Themes	28 colour + motion themes (Tokyo Night, Dracula, Nord, Solarized, Cyberpunk, White Flames, Black & White, …) — visuals only, layout preserved.
Privacy	All processing on-device by default. No telemetry without explicit opt-in. API keys in Windows Credential Manager. Every added feature is off by default.

Architecture

ContextFlow is a Rust workspace inside a Tauri 2 desktop shell. The frontend is a minimal React overlay that shows dictation state; the entire speech pipeline runs in Rust.

apps/
  desktop/                Tauri 2 shell + React UI (bubble, settings)
core/
  audio-engine/           cpal capture, rubato resampling, VAD, rtrb ring buffer
  speech-engine/          SpeechProvider trait + provider implementations
  text-injection/         UIA / SendInput / clipboard strategies
  dictation-engine/       Session orchestrator: hotkey → capture → speech → inject
  context-engine/         Focused-window detection, per-app routing profiles
  ai-engine/              AI cleanup + voice-command abstraction
  hotkey/                 Global hotkey registration + low-level keyboard hook
  settings/               Persisted config (SQLite + serde)
  telemetry/              Opt-in metrics, structured logging, crash reporting
crates/
  ipc-contracts/          Typed Tauri command/event contracts (Rust ↔ TypeScript)

The central design decision is the SpeechProvider trait in core/speech-engine. Every speech engine implements the same interface — the dictation orchestrator never touches a concrete provider. Swapping from local whisper.cpp to cloud Deepgram requires zero changes in the capture, VAD, or injection layers.

See ARCHITECTURE.md for the full technical design.

Quickstart

Prerequisites

Tool	Version	Notes
Windows 10/11	22H2+	Required for WinRT speech APIs
Rust	stable	`rustup toolchain install stable`
Node.js	20+	LTS recommended
pnpm	9+	`npm install -g pnpm`
VS Build Tools	2022+	C++ workload + Windows 11 SDK
CMake	3.x	For whisper.cpp (`winget install cmake`)

Setup

# Clone
git clone https://github.com/your-org/contextflow.git
cd contextflow

# Install JavaScript dependencies
pnpm install

# Verify the workspace compiles
cargo check --workspace

# Run in dev mode (hot-reload UI + Rust backend)
pnpm tauri dev

The first launch downloads the speech model (~142 MB) from HuggingFace automatically.

Dev Workflow

Our scripts set up all required environment variables automatically:

# Development (with CUDA GPU acceleration)
.\run-dev.ps1

# Release build
.\build.ps1

See docs/acceptance/slice-1.md for the acceptance test procedure.

Roadmap

We ship in vertical slices — each is independently runnable and acceptance-tested on Windows.

Slice	Goal	Status
1	End-to-end thin vertical: hotkey → Notepad	Done
2	Local speech pipeline (whisper.cpp, auto-download, streaming)	Done
3	Robust text injection (UIA, per-app strategies)	In progress
4	AI cleanup + voice commands	Planned
5	Context engine, snippets, personal dictionary, settings UI	Planned
6	Reliability, watchdog, installer, auto-update, telemetry	Planned

Detailed plans in ROADMAP.md.

Contributing

We follow Conventional Commits and enforce code quality with Lefthook pre-commit hooks. Every PR runs the full CI suite — cargo clippy -D warnings, cargo fmt, cargo check, pnpm lint, and pnpm typecheck.

Start with CONTRIBUTING.md for branch conventions, PR workflow, and coding standards.

Security & Privacy

All speech audio stays on-device by default. No data leaves your machine unless you explicitly enable a cloud provider.
API keys are stored in the Windows Credential Manager, never in plain text.
Telemetry is opt-in, anonymized, and limited to performance metrics.
See docs/security.md for the full threat model.

License

Apache License 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.antigravitycli		.antigravitycli
.cargo		.cargo
.claude		.claude
.github/workflows		.github/workflows
apps/desktop		apps/desktop
core		core
crates/ipc-contracts		crates/ipc-contracts
docs		docs
media		media
patches/whisper-rs-sys		patches/whisper-rs-sys
wiki		wiki
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
build.ps1		build.ps1
lefthook.yml		lefthook.yml
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
pnpm-workspace.yaml		pnpm-workspace.yaml
run-dev.bat		run-dev.bat
run-dev.ps1		run-dev.ps1
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Why ContextFlow?

Features

Architecture

Quickstart

Prerequisites

Setup

Dev Workflow

Roadmap

Contributing

Security & Privacy

License

About

Uh oh!

Releases 1

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Why ContextFlow?

Features

Architecture

Quickstart

Prerequisites

Setup

Dev Workflow

Roadmap

Contributing

Security & Privacy

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 1

Contributors

Uh oh!

Languages