lenzu 「レンズ」 (LINUX ONLY)

Linux only (X11, GTK3). No Windows or macOS support.

Desktop OCR lens — a transparent floating window that follows the mouse cursor, captures the region under it on demand, and sends it to a local or remote LLM for OCR and translation. Results appear in a separate transparent overlay HUD (lenzu_server).

The key dif:ference from browser extensions like Yomitan/Rikaichan: this operates on images (GPU-rendered video, game windows, PDFs, anything on screen), not UTF-8 text.

Architecture note: The Windows/winit/GTK4 experiments are archived in prototypes/. The active implementation uses GTK3 (gtk-rs 0.18) on Linux/X11. GTK4 was evaluated and abandoned due to integration complexity — GTK3 provides everything needed and is simpler to build against. See Technical Design for current architecture.

Architecture (Current)

lenzu (GTK3 client)               lenzu_server (Electron)
  floating lens window     UDP     transparent overlay HUD
  X11 root capture       ──────►  renders translated text
  multi-tier OCR backend           ArrowUp/Down moves position
  manages server lifecycle

Capture: x11rb captures the X11 root window directly — bypasses GPU-accelerated and hardware-rendered windows correctly.
OCR/Translation: Confidence-gated local-first pipeline, then multi-tier LLM fallback:
- Local OCR (jp_detect + manga-ocr-rs) — if detection confidence >= 71% AND OCR confidence >= 71%, returns immediately. No LLM, no network. Both Shift+Click and Ctrl+Shift+Click paths try this first.
- Local LLM primary (e.g. gemma4:e2b via ollama, 3 s) — fully on-device, no API key needed
- Local LLM fallbacks (e.g. glm-ocr, qwen2.5vl, 3 s each) — smaller OCR-specialist models
- Remote fallback (OpenRouter/Gemini 2.0 Flash, 15 s) — cloud fallback when all local paths fail
All LLM backends use the same production code path (single source of truth in client.rs).
Streaming ("stream": true) keeps each request's TCP connection alive, preventing ollama's
server-side write timeout from firing during slow CPU/partial-GPU inference.
Overlay: Formatted text sent via UDP loopback to lenzu_server, an Electron transparent window pinned to screen edge.

Privacy modes

Mode	Config	API key needed?	Images leave device?
Fully local	`OPENROUTER_API_KEY` unset	No	No
Local-first	default	No (local) / Yes (remote)	Only on fallback
Remote-only (Ctrl+Shift+Click)	any	Yes	Yes

Inference speed on typical hardware

Backend	VRAM	Typical latency	Confidence scoring
jp_detect + manga-ocr-rs (local, no LLM)	~150 MB models	~0.8–2 s per high-confidence crop (CPU)	Det 0-100%, OCR 0-100%; >= 71% both = pass
gemma4:e2b — full GPU (8 GB+)	~7.4 GB	~15–30 s	N/A (LLM fallback)
gemma4:e2b — partial GPU	~2 GB GPU + CPU	60–120 s	N/A (LLM fallback)
glm-ocr — full GPU (4 GB)	~2.2 GB	~5–15 s	N/A (LLM fallback)
Gemini 2.0 Flash (remote)	—	~3–5 s	N/A (LLM fallback)

The local OCR path (jp_detect + manga-ocr-rs) is tried first for all capture modes. When both confidence scores pass the 71% gate, no LLM or network call is needed. For 4 GB VRAM cards, this means most clean text regions are handled in under 2 seconds without touching Ollama.

Hardware and Privacy

Local-first by default: ollama runs on the same machine; no data leaves the device unless the local models fail and you have OPENROUTER_API_KEY set.
Cloud OCR: Automatically falls back to OpenRouter (Gemini 2.0 Flash) when local inference times out. Disable by leaving OPENROUTER_API_KEY unset.
Local-first OCR: jp_detect (DBNet) detects text regions with per-box confidence scores; manga-ocr-rs recognizes text with per-result confidence. When both scores pass the 71% gate, no LLM or network is needed.

Related Crates

These companion crates were developed as part of this project and are available on crates.io:

jp_detect — real-time scene text detection using DBNet (ONNX). Locates text bounding boxes in manga panels and screenshots.
manga-ocr-rs — Japanese manga OCR via ViT encoder + BERT decoder (ONNX). Converts image crops to Japanese text.
mecab-furigana-rs — MeCab-based furigana and romaji annotation. Dictionary-accurate readings at ~5 ms per call, with word segmentation and morpheme data.

See OCR Accuracy Scores for unified benchmark results across all engines and prototypes.

Libraries & Dependencies

gtk 0.18 — GTK3 bindings (gtk-rs). GTK3, not GTK4.
x11rb — X11 protocol (screen capture)
cairo-rs — 2D drawing
pango / pangocairo — text layout and CJK rendering
reqwest — HTTP client (OpenRouter API)
isolang — ISO 639-3 language codes
Electron (lenzu_server) — transparent overlay window

Build & Run

# 1. Install system dependencies
./scripts/setup.sh

# 2. Set API key
export OPENROUTER_API_KEY=sk-your-key-here

# 3. Build and run (builds lenzu_server on first run)
./scripts/run.sh

See lenzu/README.md for full configuration reference and controls.

TODO

GPU acceleration for jp_detect + manga-ocr-rs (CUDA EP) — would reduce per-crop latency from seconds to milliseconds
Wayland support via xdg-desktop-portal
Multi-monitor capture at non-zero offsets

Name		Name	Last commit message	Last commit date
Latest commit History 295 Commits
.github		.github
.vscode		.vscode
assets		assets
docs		docs
lenzu		lenzu
lenzu_server		lenzu_server
mirror		mirror
prototypes		prototypes
scripts		scripts
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
build.rs		build.rs
project.code-workspace		project.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lenzu 「レンズ」 (LINUX ONLY)

Architecture (Current)

Privacy modes

Inference speed on typical hardware

Hardware and Privacy

Related Crates

Libraries & Dependencies

Build & Run

TODO

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

lenzu 「レンズ」 (LINUX ONLY)

Architecture (Current)

Privacy modes

Inference speed on typical hardware

Hardware and Privacy

Related Crates

Libraries & Dependencies

Build & Run

TODO

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages