Live Japanese captions and translation for browser audio, video playback, livestream watching, and OBS.
The target setup is a Windows PC with a Vulkan-capable GPU and about 6 GB of VRAM. With the default Japanese ASR + Hy-MT2 translation profile, the intended live delay is roughly within 5 seconds on suitable hardware.
crisp-caption captures tab or microphone audio in the browser, streams it to CrispASR, translates finalized utterances with a local llama.cpp server, and displays subtitles in the browser, a transparent desktop overlay, or an OBS Browser Source.
browser tab/mic audio
-> WebRTC
-> Python bridge
-> CrispASR Vulkan streaming ASR
-> llama.cpp Vulkan translation server
-> browser transcript / transparent overlay / OBS overlay
This repository does not vendor runtime binaries or model files. The setup scripts download Vulkan builds into tools/ and model files into models/.
Feature demos are stored in demo/:
The demo page includes GitHub-hosted video previews and local MP4 fallbacks.
Run these commands from the project folder:
scripts\setup-windows.bat
scripts\download-crispasr-windows.bat
scripts\download-llama-cpp-windows.bat
scripts\models-download.bat
scripts\check-deps.bat
scripts\run-windows.batThen open:
http://127.0.0.1:8765/
In the browser UI, choose:
Tab audiofor video playback or livestream watching in a browser tab.Microphonefor microphone capture.Overlayfor a transparent always-on-top subtitle window.
On Chromium-based browsers, enable tab audio in the browser capture picker.
scripts\setup-windows.bat
- Checks Python and pip.
- Creates
.venv. - Installs Python dependencies.
- Installs transparent overlay dependencies.
- Creates
profiles\profile.ja.jsonfromprofiles\profile.ja.example.jsonif missing. - Installs frontend dependencies.
- Builds
frontend\dist.
scripts\download-crispasr-windows.bat
- Downloads a fixed CrispASR Windows Vulkan runtime.
- Extracts it to
tools\crispasr\. - Deletes the downloaded archive.
- Checks that
tools\crispasr\crispasr.exestarts.
scripts\download-llama-cpp-windows.bat
- Downloads a fixed llama.cpp Windows Vulkan runtime.
- Extracts it to
tools\llama.cpp\. - Deletes the downloaded archive.
- Checks that
tools\llama.cpp\llama-server.exeexists.
scripts\models-download.bat
- Downloads the ASR model, VAD model, and Hy-MT2 translation model listed in
models\manifest.json. - Stores model files under
models\.
scripts\check-deps.bat
- Checks Python packages, frontend build output, profile, CrispASR, llama.cpp, model files, ports, and translation server reachability.
scripts\run-windows.bat
- Starts the llama.cpp translation server in a separate window.
- Waits for
http://127.0.0.1:8080/health. - Starts the CrispASR bridge.
- Opens
http://127.0.0.1:8765/.
The default path uses Vulkan for both CrispASR and llama.cpp.
Recommended baseline:
- Windows 10 or 11
- Vulkan-capable GPU
- About 6 GB VRAM
- Python 3.11+
- Node.js LTS. The setup script tries Corepack/pnpm first and falls back to npm.
- Chromium-based browser for tab audio capture
If the translation server exits immediately or runs out of memory, try:
scripts\start-translation-server-low-vram-windows.batThe low-VRAM server uses smaller llama.cpp context/batch settings. It may be slower or have less translation context.
The default profile expects:
models\asr\cohere-asr-ja-v0.1-q4_k.gguf
models\vad\firered-vad.gguf
models\translation\Hy-MT2-1.8B-Q4_K_M.gguf
models\manifest.json uses pinned Hugging Face resolve URLs with SHA256 verification. Model payloads are ignored by Git.
Hy-MT2 uses the Tencent HY Community License Agreement, not a permissive open-source license. Read docs\third-party.md and the upstream license before redistribution or commercial use.
Public example profiles live in profiles\.
profiles\profile.ja.example.json
setup-windows.bat copies it to:
profiles\profile.ja.json
Local profile JSON files are ignored by Git. Edit profiles\profile.ja.json for your machine.
Important fields:
"crispasr": "tools/crispasr/crispasr.exe",
"translate_model": "Hy-MT2-1.8B",
"translate_url": "http://127.0.0.1:8080/v1/chat/completions"Model paths in crisp_args, such as ../models/asr/model.gguf, are resolved relative to the profile JSON file.
Click Overlay in the browser UI to start the native transparent subtitle overlay.
Controls:
- Hold
Ctrlto show the control frame. - Hold
Ctrland drag the middle area to move the overlay. - Hold
Ctrland drag the handles to resize it. - Hold
Ctrland clickxto close it. Ctrl+Qalso closes the overlay.
For OBS, use a Browser Source:
http://127.0.0.1:8765/obs-overlay
Set the Browser Source size to your canvas size, for example 1920 x 1080. The page has a transparent background and connects to the same subtitle stream.
The default translation server command is in:
scripts\start-translation-server-windows.batIt uses llama.cpp Vulkan with:
-c 8192 -b 2048 -ub 1024
The profile model name must match the llama.cpp alias:
"translate_model": "Hy-MT2-1.8B"Translation is final-only. Partial ASR text is shown as live preview but is not sent to the translation model.
Run:
scripts\check-deps.batCommon fixes:
- Missing Python packages: run
scripts\setup-windows.bat. - Missing CrispASR: run
scripts\download-crispasr-windows.bat. - Missing llama.cpp: run
scripts\download-llama-cpp-windows.bat. - Missing models: run
scripts\models-download.bat. - Translation server out of memory: use
scripts\start-translation-server-low-vram-windows.bat. - Browser page not found: rerun
scripts\setup-windows.batto rebuildfrontend\dist.
Run the frontend dev server:
cd frontend
corepack pnpm install
corepack pnpm devKeep the Python bridge running on 127.0.0.1:8765; Vite proxies backend calls.
Build the production UI:
cd frontend
corepack pnpm buildUse the virtual environment Python after setup:
.venv\Scripts\python.exe bridge_server.py --config profiles\profile.ja.json --print-raw-crisp-events
.venv\Scripts\python.exe bridge_server.py --config profiles\profile.ja.json --no-translate
.venv\Scripts\python.exe bridge_server.py --config profiles\profile.ja.json --no-translate --debug-timestamps
.venv\Scripts\python.exe bridge_server.py --config profiles\profile.ja.json -vdocs\PARAMETERS.md: profile and CrispASR flag reference.docs\changelog.md: public release notes.docs\third-party.md: third-party runtime and model license notes.profiles\profile.ja.example.json: public Japanese live-subtitle example profile.
crisp-caption source code is licensed under the Apache License 2.0. Runtime binaries and model files downloaded by the helper scripts are third-party artifacts under their own licenses. See docs\third-party.md.
