Linux only (X11, GTK3). No Windows or macOS support.
Desktop OCR lens — a transparent floating window that follows the mouse cursor, captures the region under it on demand, and sends it to a local or remote LLM for OCR and translation. Results appear in a separate transparent overlay HUD (lenzu_server).
The key dif:ference from browser extensions like Yomitan/Rikaichan: this operates on images (GPU-rendered video, game windows, PDFs, anything on screen), not UTF-8 text.
Architecture note: The Windows/winit/GTK4 experiments are archived in
prototypes/. The active implementation uses GTK3 (gtk-rs0.18) on Linux/X11. GTK4 was evaluated and abandoned due to integration complexity — GTK3 provides everything needed and is simpler to build against. See Technical Design for current architecture.
lenzu (GTK3 client) lenzu_server (Electron)
floating lens window UDP transparent overlay HUD
X11 root capture ──────► renders translated text
multi-tier OCR backend ArrowUp/Down moves position
manages server lifecycle
-
Capture:
x11rbcaptures the X11 root window directly — bypasses GPU-accelerated and hardware-rendered windows correctly. -
OCR/Translation: Confidence-gated local-first pipeline, then multi-tier LLM fallback:
- Local OCR (
jp_detect+manga-ocr-rs) — if detection confidence >= 71% AND OCR confidence >= 71%, returns immediately. No LLM, no network. Both Shift+Click and Ctrl+Shift+Click paths try this first. - Local LLM primary (e.g.
gemma4:e2bvia ollama, 3 s) — fully on-device, no API key needed - Local LLM fallbacks (e.g.
glm-ocr,qwen2.5vl, 3 s each) — smaller OCR-specialist models - Remote fallback (OpenRouter/Gemini 2.0 Flash, 15 s) — cloud fallback when all local paths fail
All LLM backends use the same production code path (single source of truth in
client.rs).
Streaming ("stream": true) keeps each request's TCP connection alive, preventing ollama's
server-side write timeout from firing during slow CPU/partial-GPU inference. - Local OCR (
-
Overlay: Formatted text sent via UDP loopback to
lenzu_server, an Electron transparent window pinned to screen edge.
| Mode | Config | API key needed? | Images leave device? |
|---|---|---|---|
| Fully local | OPENROUTER_API_KEY unset |
No | No |
| Local-first | default | No (local) / Yes (remote) | Only on fallback |
| Remote-only (Ctrl+Shift+Click) | any | Yes | Yes |
| Backend | VRAM | Typical latency | Confidence scoring |
|---|---|---|---|
| jp_detect + manga-ocr-rs (local, no LLM) | ~150 MB models | ~0.8–2 s per high-confidence crop (CPU) | Det 0-100%, OCR 0-100%; >= 71% both = pass |
| gemma4:e2b — full GPU (8 GB+) | ~7.4 GB | ~15–30 s | N/A (LLM fallback) |
| gemma4:e2b — partial GPU | ~2 GB GPU + CPU | 60–120 s | N/A (LLM fallback) |
| glm-ocr — full GPU (4 GB) | ~2.2 GB | ~5–15 s | N/A (LLM fallback) |
| Gemini 2.0 Flash (remote) | — | ~3–5 s | N/A (LLM fallback) |
The local OCR path (jp_detect + manga-ocr-rs) is tried first for all capture modes. When both confidence scores pass the 71% gate, no LLM or network call is needed. For 4 GB VRAM cards, this means most clean text regions are handled in under 2 seconds without touching Ollama.
- Local-first by default:
ollamaruns on the same machine; no data leaves the device unless the local models fail and you haveOPENROUTER_API_KEYset. - Cloud OCR: Automatically falls back to OpenRouter (Gemini 2.0 Flash) when local inference times out. Disable by leaving
OPENROUTER_API_KEYunset. - Local-first OCR:
jp_detect(DBNet) detects text regions with per-box confidence scores;manga-ocr-rsrecognizes text with per-result confidence. When both scores pass the 71% gate, no LLM or network is needed.
These companion crates were developed as part of this project and are available on crates.io:
jp_detect— real-time scene text detection using DBNet (ONNX). Locates text bounding boxes in manga panels and screenshots.manga-ocr-rs— Japanese manga OCR via ViT encoder + BERT decoder (ONNX). Converts image crops to Japanese text.mecab-furigana-rs— MeCab-based furigana and romaji annotation. Dictionary-accurate readings at ~5 ms per call, with word segmentation and morpheme data.
See OCR Accuracy Scores for unified benchmark results across all engines and prototypes.
gtk0.18 — GTK3 bindings (gtk-rs). GTK3, not GTK4.x11rb— X11 protocol (screen capture)cairo-rs— 2D drawingpango/pangocairo— text layout and CJK renderingreqwest— HTTP client (OpenRouter API)isolang— ISO 639-3 language codes- Electron (
lenzu_server) — transparent overlay window
# 1. Install system dependencies
./scripts/setup.sh
# 2. Set API key
export OPENROUTER_API_KEY=sk-your-key-here
# 3. Build and run (builds lenzu_server on first run)
./scripts/run.shSee lenzu/README.md for full configuration reference and controls.
- GPU acceleration for jp_detect + manga-ocr-rs (CUDA EP) — would reduce per-crop latency from seconds to milliseconds
- Wayland support via xdg-desktop-portal
- Multi-monitor capture at non-zero offsets



