parakeet-realtime-server

Streaming HTTP + WebSocket server wrapping NVIDIA Parakeet Realtime EOU (120M) via parakeet-rs. Cache-aware streaming ASR with end-of-utterance detection, designed to drop into any app that needs low-latency transcription.

Highlights

~160 ms chunked streaming over WebSocket, partial text emitted per chunk
[EOU] end-of-utterance detection → final events, decoder cache resets
Plain HTTP endpoint for batch WAV transcription
Single ~5 MB Rust binary, dynamically loads onnxruntime
CUDA 12.x acceleration on Windows x64

Quick start (pre-built binary)

Download parakeet-realtime-server-v0.1.0-win-x64.zip from the latest release.
Unzip into a folder of your choice (call it <install>).
Clone this repo anywhere and copy scripts/ into <install>/scripts/ (or just download the four PowerShell scripts individually).
In <install>:

.\scripts\fetch-cuda-deps.ps1      # one-time, ~2 GB of NVIDIA DLLs
.\scripts\download-models.ps1      # one-time, ~480 MB of ONNX weights
.\parakeet-realtime-server.exe --model-dir models --port 9005

Health check: curl http://127.0.0.1:9005/health should return {"ready":true} after warm-up.

Build from source

Prerequisites (Windows x64)

Rust stable — install via rustup (rustup default stable-x86_64-pc-windows-msvc)
Visual Studio 2022 Build Tools with the "Desktop development with C++" workload
CUDA 12.x toolkit (set CUDA_PATH)
cuDNN 9.x DLLs (the fetch-cuda-deps.ps1 script grabs these)
7-Zip on PATH (needed by fetch-cuda-deps.ps1 to extract one archive)

Build

git clone https://github.com/pauldaywork/parakeet-realtime-server
cd parakeet-realtime-server
.\scripts\build.ps1

Output lands in dist/. Then:

.\scripts\fetch-cuda-deps.ps1
.\scripts\download-models.ps1
.\dist\parakeet-realtime-server.exe --model-dir dist\models --port 9005

HTTP API

`GET /health`

Readiness check. {"ready": false} while the model warms up, {"ready": true} once a test transcription completes.

`POST /v1/audio/transcriptions`

Multipart WAV upload. Field name: file. Returns {"text": "..."}.

curl -F file=@audio.wav http://127.0.0.1:9005/v1/audio/transcriptions

WebSocket streaming

Connect to ws://127.0.0.1:9005/stream. Send binary frames containing raw int16 LE PCM at 16 kHz. The server accumulates into 160 ms chunks and emits JSON text frames:

{"type": "partial", "text": "hello wor"}
{"type": "partial", "text": "hello world"}
{"type": "final",   "text": "hello world"}

Send {"type":"end"} to flush remaining audio. The server resets its decoder cache after each final event so multiple utterances can be streamed on one connection.

CLI flags

Flag	Default	Description
`--model-dir`	(required)	Folder containing `encoder.onnx`, `decoder_joint.onnx`, `tokenizer.json`
`--host`	`127.0.0.1`	Bind address
`--port`	`9005`	TCP port
`--chunk-ms`	`160`	Chunk size in ms (affects latency)
`--device`	`cuda`	`cuda` or `cpu`

Examples

examples/mic-streaming.html — zero-dependency single-file demo; open in any browser after starting the server.
examples/web-app/ — a full Vite/React app with mic picker, live transcript, raw event log, and health badge.

Troubleshooting

Symptom	Fix
`exit code 0xc0000135` at startup	Missing CUDA/cuDNN DLLs. Run `scripts\fetch-cuda-deps.ps1`.
`Could not locate cudnn_graph64_9.dll`	cuDNN 9.x sub-DLL missing. Same fix.
`/health` returns `{"ready": false}` forever	Model files not found. Check `--model-dir` points at a folder with all three ONNX files.
Partial events stop mid-stream	Client stopped sending PCM but didn't send `{"type":"end"}`. Send it before closing.

License

MIT. See LICENSE.

Acknowledgements

altunenes/parakeet-rs — the underlying Rust parakeet inference crate this server wraps.
NVIDIA Parakeet Realtime EOU 120M — the model.
whisper.cpp — repo-structure inspiration.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
examples		examples
scripts		scripts
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parakeet-realtime-server

Highlights

Quick start (pre-built binary)

Build from source

Prerequisites (Windows x64)

Build

HTTP API

`GET /health`

`POST /v1/audio/transcriptions`

WebSocket streaming

CLI flags

Examples

Troubleshooting

License

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

parakeet-realtime-server

Highlights

Quick start (pre-built binary)

Build from source

Prerequisites (Windows x64)

Build

HTTP API

GET /health

POST /v1/audio/transcriptions

WebSocket streaming

CLI flags

Examples

Troubleshooting

License

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /health`

`POST /v1/audio/transcriptions`

Packages