✨ feat(ml): built-in local OCR via the stella-ml sidecar by vaayne · Pull Request #600 · CherryHQ/stella

vaayne · 2026-06-26T11:16:31Z

What

Built-in local OCR on the document-extraction seam, served by the stella-ml
sidecar. Promotes the paddleocr-onnx POC (PP-OCRv5 mobile det+rec, ONNX) into
the sidecar and wires a real POST /v1/extract, then plugs it into
internal/document.Extractor as an image fallback.

Stacked on #598 (the embedding half). Base will retarget to main once #598 merges.

Why

stellad is CGO_ENABLED=0 and can't host onnxruntime, but the sidecar already
exists for embeddings. OCR rides the same process: offline, no API key, no network
egress. The read tool can now turn an unreadable image into text for non-vision
models without a remote OCR dependency.

How

Sidecar (cmd/stella-ml): long-lived ocrEngine (det+rec sessions reused),
real /v1/extract — octet-stream image body + X-Stella-Mime → JSON
{content, mime_type}. Image decode covers JPEG/PNG/GIF/WebP/BMP/TIFF. OCR is
optional and all-or-nothing across det/rec/keys, so embedding-only bundles still
boot (extract → 503).
Resolver (internal/mlruntime): resolves det.onnx/rec.onnx/rec_keys.txt
independently from the embed model; Resolved.HasOCR().
Seam (internal/document): single NewExtractor() wraps the build-tagged
base extractor in a composite that falls back to sidecar OCR for image inputs the
text layer can't read. Backend injected process-wide via SetLocalOCR (one
factory, many construction sites). Adapter + STELLA_LOCAL_OCR toggle live in
setupMLSidecar.

Verified end-to-end (darwin/arm64): composition root → SetLocalOCR →
NewExtractor() → sidecar OCR extracts a zh/en/digit page (124 chars). Unit tests
cover the composite (skip-when-text, image-fallback, non-image-skip, no-OCR
passthrough) and optional/partial OCR resolution. format && build && test green.

Known MVP simplifications (→ Phase 4b)

Axis-aligned bounding-rect detection; rotated/skewed crop (minAreaRect +
perspective warp) deferred.
No angle classifier (180°-flipped lines).
Image-only; PDF rasterization is a separate adapter.
Toggle is an env var; moves to deployment config + settings UI in a later phase.

Refs

✨ feat(ml): built-in local OCR + embedding via a native stella-ml sidecar #597 (native stella-ml sidecar: local OCR + embedding)
Stacked on ✨ feat(ml): built-in local OCR + embedding via a native stella-ml sidecar #598

Promote the paddleocr-onnx POC engine into the sidecar as a long-lived det+rec ocrEngine and wire the real POST /v1/extract: octet-stream image body + X-Stella-Mime -> JSON {content, mime_type}. OCR is optional and all-or-nothing across its three assets, so an embedding-only bundle still boots and the endpoint reports 503 until OCR models are installed. Axis-aligned bounding-rect detection (MVP); rotated/skewed crop via minAreaRect + perspective warp is deferred to Phase 4b.

Wrap the platform base extractor in a composite that falls back to local sidecar OCR for image inputs the text layer can't read. Inject the backend process-wide via document.SetLocalOCR (single extractor factory, many construction sites). mlruntime resolves det/rec/keys independently from the embed model; setupMLSidecar passes the OCR flags and installs the adapter, gated by STELLA_LOCAL_OCR. Verified end-to-end: composition root -> SetLocalOCR -> NewExtractor -> sidecar OCR extracts a zh/en/digit page.

vaayne added 3 commits June 26, 2026 19:10

📝 docs(stella-ml): document the OCR extract endpoint

727adfb

vaayne mentioned this pull request Jun 26, 2026

✨ feat(settings): local embedding provider + OCR toggle in the admin UI #601

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

✨ feat(ml): built-in local OCR via the stella-ml sidecar#600

✨ feat(ml): built-in local OCR via the stella-ml sidecar#600
vaayne wants to merge 3 commits into
poc/onnx-runtimefrom
poc/onnx-runtime-ocr

vaayne commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

vaayne commented Jun 26, 2026

What

Why

How

Known MVP simplifications (→ Phase 4b)

Refs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant