Skip to content

iOS keyboard that transcribes speech to text using Whisper

License

Notifications You must be signed in to change notification settings

omachala/diction

Repository files navigation

Diction

The free, open-source alternative to Wispr Flow.
Self-hosted speech-to-text keyboard for iOS.

WebsiteSelf-Hosting GuidePrivacy Policy

License


Diction settings screen   Diction recording screen

Why Diction?

Voice-to-text keyboards like Wispr Flow cost $15/month and send your audio to their cloud. Apple's built-in dictation is free but unreliable.

Diction is different:

  • Self-hosted is free - no subscription, no word limits, no trial that expires. Bring your own server.
  • Your server, your data - audio goes to a Whisper server you run. Not our cloud. Not anyone's cloud. Your network.
  • Open source infrastructure - the server setup is right here. Inspect it, modify it, contribute to it.
  • Model agnostic - point it at any OpenAI-compatible endpoint. Whisper tiny, large-v3, distil, fine-tuned models, future models. You choose.
  • Zero-dependency iOS app - pure Swift, no third-party SDKs, no analytics, no tracking. Fully auditable.

Don't want to self-host? Diction Cloud provides the same experience with zero setup.

Think of it like Bitwarden - free and self-hosted for those who want control, with a hosted cloud option for convenience.

How It Works

  1. Run the gateway + a Whisper model on any machine (home server, NAS, cloud VM, Raspberry Pi)
  2. Make it reachable from your phone (local IP, reverse proxy, or Cloudflare Tunnel)
  3. Paste the URL into the Diction app
  4. Switch to the Diction keyboard in any app → tap mic → speak → text appears

That's the entire setup. Three commands to start the server:

git clone https://github.com/omachala/diction.git
cd diction
docker compose up -d gateway whisper-small

Gateway is now running at http://<your-server-ip>:9000. Done.

How is this different from...

DictionWispr FlowApple Dictation
PriceFree (self-hosted)$15/monthFree
Audio stays on your network❌ Cloud
Open source server
iOS keyboard✅ Built-in
Model agnostic✅ Any model, any URL❌ Locked in❌ Locked in
Zero third-party SDKsN/A

Diction is pure transcription: what you say is what you get. No AI rewriting, no filler word removal. If you want that, paid alternatives exist. Diction's trade-off is freedom, privacy, and cost.

Gateway

The gateway sits in front of your Whisper models and provides:

  • Model routing — switch models from the app without changing your server URL
  • WebSocket streaming — audio streams to the server during recording, so transcription starts instantly when you stop (no upload wait)
  • Health checksGET /health and GET /v1/models report which backends are up
docker compose up -d gateway whisper-small

Point the Diction app to http://<your-server-ip>:9000. The gateway routes requests to the right model backend automatically.

You can also skip the gateway and connect directly to a model (e.g. http://<ip>:9002 for small). The gateway is optional but recommended.

Models

Diction is model agnostic. It works with any OpenAI-compatible speech-to-text endpoint - public models, private models, fine-tuned models, future models. You're not locked into anything.

This repo includes a Docker Compose setup with popular faster-whisper models to get you started:

docker compose up -d whisper-tiny          # ~350 MB RAM, ~1-2s
docker compose up -d whisper-small         # ~800 MB RAM, ~3-4s  ← recommended
docker compose up -d whisper-medium        # ~1.8 GB RAM, ~8-12s
docker compose up -d whisper-large         # ~3.5 GB RAM, ~20-30s
docker compose up -d whisper-distil-large  # ~2 GB RAM, ~4-6s

Run multiple models at once and switch between them in the app:

docker compose up -d gateway whisper-small whisper-medium whisper-large

But you can point Diction at anything: whisper.cpp, OpenAI's API, a custom fine-tuned model for your language or domain, or any future model that speaks the same protocol. If it has an /v1/audio/transcriptions endpoint, Diction works with it.

No Public IP?

No problem. You don't need to open ports on your router:

  • Cloudflare Tunnel - free, outbound-only connection to Cloudflare's edge. No port forwarding needed.
  • Tailscale - free WireGuard mesh VPN. Install on server + phone, connect from anywhere.
  • ngrok - instant public URL, great for testing.

See the Self-Hosting Guide for detailed instructions.

Privacy

This is a keyboard extension. We take privacy seriously:

  • Self-hosted: Audio goes only to your server. Full stop.
  • Cloud mode: Audio is processed and immediately discarded. Not stored, not used for training.
  • No analytics, no tracking, no telemetry. The app contains zero third-party SDKs.
  • Full Access is required by iOS for network - the keyboard needs to reach the Whisper endpoint. No keylogging, no clipboard access.

Read the full Privacy Policy.

Requirements

  • iOS 16.0+ (iPhone)
  • For self-hosting: any machine that can run Docker (the gateway itself uses ~15 MB RAM)

Diction Cloud

Don't want to self-host? Diction Cloud is a hosted alternative - same accuracy, zero setup, no server to maintain. Priced to be cheaper than running your own VPS. See diction.one for details.

Contributing

We welcome contributions to the self-hosting infrastructure, documentation, and Docker setup. See CONTRIBUTING.md.

License

MIT - see LICENSE.

The iOS app is distributed via the App Store. This repository contains the self-hosting infrastructure and documentation.

About

iOS keyboard that transcribes speech to text using Whisper

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors