feat(embedding): EmbeddingGemma on-device RAG embedder (ONNX, 256-dim) with USE-Lite fallback by sagar-develop · Pull Request #24 · sagar-develop/litertlm-kmp

sagar-develop · 2026-06-05T07:25:25Z

Replaces the 2018-era Universal Sentence Encoder (USE-Lite, 100-dim) with EmbeddingGemma 300M as the default RAG embedder, lifting retrieval quality for both chat answers and every Studio artifact. USE-Lite stays as the friction-free, low-end fallback. Implements the design in docs/EMBEDDING_GEMMA_PLAN.md (included in this PR).

Why the earlier attempt failed (and how this fixes it)

EmbeddingGemma is a 300M transformer, not a TFLite Task model. Three landmines, all addressed here:

Wrong loader — MediaPipe TextEmbedder only accepts TFLite Task models. → New, separate OnnxEmbeddingEngine (ONNX Runtime); the MediaPipe path is untouched.
Dimension lock — @HnswIndex(dimensions = 100L) is an annotation literal. → New GemmaChunkEntity at 256-dim parallel to the 100-dim store; no in-place dim change.
Missing task prompts — EmbeddingGemma needs query: / text: instruction prefixes; the old embed(text) was symmetric. → Interface is now task-aware (EmbeddingTask.QUERY/DOCUMENT).

What's implemented

Engine (`lib/`)

EmbeddingEngine — task-aware: dimensions + embed(text, task, title). MediaPipeEmbeddingEngine adapted (dim 100, symmetric).
OnnxEmbeddingEngine — EmbeddingGemma via ONNX Runtime: instruction prompts → tokenize → mean-pool over the attention mask (or a pre-pooled sentence_embedding) → Matryoshka-truncate to 256 → L2-normalize.
HfGemmaTokenizer — HuggingFace tokenizer reading the model's tokenizer.json.
ModelFormat.ONNX_EMBEDDER + ModelDescriptor.companions so the tokenizer downloads alongside the model.
Deps: onnxruntime-android, ai.djl.huggingface:tokenizers (arm64); consumer ProGuard keeps.

App (`sample-app/`)

GemmaChunkEntity (256-dim HNSW) beside DocumentChunkEntity (100-dim). The repository routes vector ops by dimension and returns neutral Chunk/ScoredChunk DTOs (decoupling retriever/Studio from the active store).
RagHolder picks the active embedder (Gemma when its files are present and RAM ≥ 4 GB, else USE-Lite), downloads the model + companion tokenizer, and migrateToGemma() re-indexes legacy chunks from their stored text (document-scoped, idempotent, runs in the background).
Ingest/retrieve are task-aware; per-embedder distance gate (USE 0.75 vs Gemma 0.55 — provisional).
Backups round-trip both stores (schema v2, each ChunkDto carries its dim); non-active-embedder chunks re-index on first use.
Catalog descriptor for EmbeddingGemma (ungated ONNX mirror; surface Gemma terms in the onboarding gate).

⚠️ Build & verify checklist (not done in this sandbox)

No Android SDK here, so this was not compiled (same precedent as #22) and these steps must run on a real build/device:

Build regenerates the ObjectBox model — the new GemmaChunkEntity makes the ObjectBox plugin update sample-app/objectbox-models/default.json and generate GemmaChunkEntity_ / MyObjectBox. Commit the regenerated default.json.
Tokenizer runtime — confirm ai.djl.huggingface:tokenizers ships an arm64 .so for Android and loads tokenizer.json; if not, swap to onnxruntime-extensions (the documented fallback). Highest-risk item.
ONNX I/O names — verify the chosen EmbeddingGemma ONNX export's input names (input_ids/attention_mask) and output (sentence_embedding vs last_hidden_state); OnnxEmbeddingEngine.pool() handles both but the export should be confirmed.
Pin model artifacts — the catalog URL/sizeBytes/sha256 for the model + tokenizer are placeholders against the onnx-community mirror; pin to a verified revision (prefer a QAT/INT8 build).
Re-tune RELEVANCE_MAX_DISTANCE_GEMMA against real corpora.
On-device (CPH2723) — USE-vs-Gemma retrieval A/B, embed latency/memory, migration on an upgraded install, low-end USE fallback, and a backup round-trip.

Out of scope (follow-ups)

Reranker second stage; ORT-format/mobile size optimization; iOS embedder (the engine is KMP-portable); token-aware chunking.

https://claude.ai/code/session_01GY7vyycq3iQTQxiooMnxJW

Generated by Claude Code

…) with USE-Lite fallback Engine (lib): - Task-aware EmbeddingEngine (QUERY/DOCUMENT + dimensions); OnnxEmbeddingEngine (EmbeddingGemma 300M via ONNX Runtime: instruction prompts, mean-pool, Matryoshka-256, L2-norm) + HfGemmaTokenizer (tokenizer.json). - ModelFormat.ONNX_EMBEDDER and ModelDescriptor.companions (tokenizer download). App (sample-app): - GemmaChunkEntity (256-dim HNSW) alongside the 100-dim USE store; repository routes vector ops by dimension and returns neutral Chunk/ScoredChunk DTOs. - RagHolder selects the active embedder (Gemma on capable devices, USE fallback), downloads model+companion, and re-indexes legacy chunks (migrateToGemma). - Ingest/retrieve are task-aware; per-embedder distance gate. - Backups round-trip both stores (schema v2, per-chunk dim); catalog descriptor. Note: not compiled in this sandbox (no Android SDK); first build regenerates the ObjectBox model. Tokenizer runtime + ONNX output names need on-device verification.

sagar-develop · 2026-06-05T16:33:04Z

Closing without merging. On-device test found the EmbeddingGemma ONNX embedder won't load as configured: the catalog descriptor lists only tokenizer.json as a companion and omits the model_quantized.onnx_data external-weights file (~295 MB), so the in-app download fetches a weightless 567 KB graph and ORT session creation would fail. Also: branch is 31 commits behind main and the regenerated objectbox default.json (new GemmaChunkEntity) wasn't committed. Revisit as a fresh branch off current main with the weights companion wired in.

sagar-develop closed this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(embedding): EmbeddingGemma on-device RAG embedder (ONNX, 256-dim) with USE-Lite fallback#24

feat(embedding): EmbeddingGemma on-device RAG embedder (ONNX, 256-dim) with USE-Lite fallback#24
sagar-develop wants to merge 1 commit into
mainfrom
claude/embeddinggemma

sagar-develop commented Jun 5, 2026

Uh oh!

sagar-develop commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sagar-develop commented Jun 5, 2026

Why the earlier attempt failed (and how this fixes it)

What's implemented

Engine (lib/)

App (sample-app/)

⚠️ Build & verify checklist (not done in this sandbox)

Out of scope (follow-ups)

Uh oh!

sagar-develop commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Engine (`lib/`)

App (`sample-app/`)