✨ feat(ml): built-in local OCR via the stella-ml sidecar#600
Draft
vaayne wants to merge 3 commits into
Draft
Conversation
Promote the paddleocr-onnx POC engine into the sidecar as a long-lived
det+rec ocrEngine and wire the real POST /v1/extract: octet-stream image
body + X-Stella-Mime -> JSON {content, mime_type}. OCR is optional and
all-or-nothing across its three assets, so an embedding-only bundle still
boots and the endpoint reports 503 until OCR models are installed.
Axis-aligned bounding-rect detection (MVP); rotated/skewed crop via
minAreaRect + perspective warp is deferred to Phase 4b.
Wrap the platform base extractor in a composite that falls back to local sidecar OCR for image inputs the text layer can't read. Inject the backend process-wide via document.SetLocalOCR (single extractor factory, many construction sites). mlruntime resolves det/rec/keys independently from the embed model; setupMLSidecar passes the OCR flags and installs the adapter, gated by STELLA_LOCAL_OCR. Verified end-to-end: composition root -> SetLocalOCR -> NewExtractor -> sidecar OCR extracts a zh/en/digit page.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Built-in local OCR on the document-extraction seam, served by the
stella-mlsidecar. Promotes the
paddleocr-onnxPOC (PP-OCRv5 mobile det+rec, ONNX) intothe sidecar and wires a real
POST /v1/extract, then plugs it intointernal/document.Extractoras an image fallback.Stacked on #598 (the embedding half). Base will retarget to
mainonce #598 merges.Why
stelladisCGO_ENABLED=0and can't host onnxruntime, but the sidecar alreadyexists for embeddings. OCR rides the same process: offline, no API key, no network
egress. The read tool can now turn an unreadable image into text for non-vision
models without a remote OCR dependency.
How
cmd/stella-ml): long-livedocrEngine(det+rec sessions reused),real
/v1/extract— octet-stream image body +X-Stella-Mime→ JSON{content, mime_type}. Image decode covers JPEG/PNG/GIF/WebP/BMP/TIFF. OCR isoptional and all-or-nothing across det/rec/keys, so embedding-only bundles still
boot (extract → 503).
internal/mlruntime): resolvesdet.onnx/rec.onnx/rec_keys.txtindependently from the embed model;
Resolved.HasOCR().internal/document): singleNewExtractor()wraps the build-taggedbase extractor in a composite that falls back to sidecar OCR for image inputs the
text layer can't read. Backend injected process-wide via
SetLocalOCR(onefactory, many construction sites). Adapter +
STELLA_LOCAL_OCRtoggle live insetupMLSidecar.Verified end-to-end (darwin/arm64): composition root →
SetLocalOCR→NewExtractor()→ sidecar OCR extracts a zh/en/digit page (124 chars). Unit testscover the composite (skip-when-text, image-fallback, non-image-skip, no-OCR
passthrough) and optional/partial OCR resolution.
format && build && testgreen.Known MVP simplifications (→ Phase 4b)
perspective warp) deferred.
Refs