feat(rag): EmbeddingGemma-300M via ONNX + hybrid retrieval, doc-relevance gate, reranker by sagar-develop · Pull Request #32 · sagar-develop/litertlm-kmp

sagar-develop · 2026-06-08T19:47:26Z

Summary

Replaces the USE-Lite (100-dim TFLite) document embedder with EmbeddingGemma-300M on ONNX Runtime — telemetry-free, no MediaPipe/Play Services — and rebuilds document retrieval as a hybrid pipeline with a set of retrieval-quality fixes verified on-device.

Embedder upgrade

EmbeddingGemma-300M via ONNX Runtime (OnnxEmbeddingEngine), with an optional ms-marco-MiniLM-L6 cross-encoder reranker (OnnxReranker).
Device-tiered: USE-Lite (<6 GB) · Gemma@256 (6–8 GB) · Gemma@256 + reranker (8–10 GB) · Gemma@512 + reranker (≥10 GB), driven by EmbedderRecommendation and surfaced as a "Recommended" badge. Models download on-device through the catalogue; nothing is bundled.
Pure-Kotlin tokenizers (GemmaBpeTokenizer, BertWordPieceTokenizer) reading the HF tokenizer.json, validated against the reference transformers tokenizer — onnxruntime-extensions has no GemmaTokenizer.
Matryoshka truncation (128/256/512) with per-dim ObjectBox HNSW entities (GemmaChunk128/256/512) + dim routing in the repository.
Task-aware QUERY/DOCUMENT embedding prompts.

Hybrid retrieval

Dense vector search + BM25 lexical scoring fused via Reciprocal Rank Fusion, with a per-document cap, wider candidate pools, and a larger grounding budget.

Retrieval-quality fixes (the war story)

A real personal-finance project held three policies: a car policy (TATA AIG, ₹8,504), a life policy (Future Generali, ₹41,799), and a health policy (ICICI Lombard).

Wrong-document grounding — "car insurance premium" answered ₹41,799 from the life policy because BM25 lexical overlap let the life PDF out-score the car PDF. ₹8,504 is the correct car premium. Fixed by a document-level dominance gate that keeps grounding on the source that genuinely dominates the candidate set.
Title-match override — "who is the insurer of my car policy" grounded on the health policy (whose formal "…insurer" phrasing out-scored the car doc). A distinctive query term naming a doc by title now grounds on that doc → answer went ICICI Lombard → TATA AIG.
Truncated grounded answers — grounded replies collapsed to 1–2 tokens after a few turns because the stateful LiteRT-LM KV cache accumulated each turn's grounding block. Fixed by a per-grounded-turn session reset (reopenSessionAndAwait) that re-prefills only bounded visible history (MAX_PREFILL_TURNS=16).
Self-healing migration — document-level re-index into the active embedder's index on next open; no re-import/OCR.
Reranker recall win — enabling the ungated cross-encoder reranker on the 8 GB tier recovered a health-insurer chunk the first-stage fusion ranked too low.

Testing

11/11 DefaultDocumentRetrieverTest retriever unit tests pass.
:sample-app:assembleRelease (R8 + signed) builds green.
Verified on-device (Realme CPH2723): all four grounding scenarios above behave correctly post-fix.

Known residual

The life policy's sum assured sits inside a garbled extracted table (PDF text-extraction artifact), so it can be missed at the first-stage recall step. This is a PDF-extraction / first-stage recall limitation, not a ranking bug — documented for follow-up.

🤖 Generated with Claude Code

…ance gate, reranker Replace the USE-Lite (100-dim TFLite) embedder with EmbeddingGemma-300M running on ONNX Runtime — telemetry-free, no MediaPipe/Play Services. The embedder is device-tiered: USE-Lite (<6GB), Gemma@256 (6-8GB), Gemma@256+reranker (8-10GB), Gemma@512+reranker (>=10GB), with a recommendation engine and on-device gated download. Embedder upgrade: - OnnxEmbeddingEngine + OnnxReranker (ms-marco MiniLM-L6 cross-encoder). - Pure-Kotlin tokenizers (GemmaBpeTokenizer, BertWordPieceTokenizer) because onnxruntime-extensions has no GemmaTokenizer; validated vs HF. - Matryoshka truncation (128/256/512) + per-dim ObjectBox HNSW entities (GemmaChunk128/256/512) with dim routing in the repository. - Task-aware QUERY/DOCUMENT prompts in EmbeddingEngine. - Device-tiered EmbedderRecommendation (reranker >=8000MB). Retrieval-quality fixes (the headline story): - Document-level DOMINANCE gate: car-insurance queries were grounding on the wrong document (a life policy answering a wrong premium) due to BM25 lexical pollution; the gate keeps grounding on the dominant document. - TITLE-MATCH override: a distinctive query term naming a document by title now grounds on that document ("insurer of my car policy" went from the health policy to the correct car policy). - Per-grounded-turn KV reset (reopenSessionAndAwait, MAX_PREFILL_TURNS=16): grounded answers were truncating to 1-2 tokens after a few turns because the stateful LiteRT-LM KV cache accumulated each turn's grounding block. - Document-level self-healing embedding migration in RagHolder. - Cross-encoder reranker enabled (ungated) on the 8GB tier, fixing a health-insurer recall miss. Hybrid retrieval = vector + BM25 + Reciprocal Rank Fusion. 11/11 retriever unit tests pass; release build green; verified on-device (Realme CPH2723). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

App versionName 0.9.0 -> 0.10.0 (versionCode 7 -> 8); engine lib version bumped in lockstep (0.9.0 -> 0.10.0). Promote the EmbeddingGemma RAG work from [Unreleased] to a dated [0.10.0] section and document the retrieval-quality fixes (hybrid retrieval/RRF, dominance gate, title-match override, per-grounded-turn KV reset, self-healing migration, reranker). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

sagar-develop and others added 2 commits June 9, 2026 01:15

sagar-develop merged commit b779db3 into main Jun 8, 2026
1 check passed

sagar-develop mentioned this pull request Jun 8, 2026

RAG embedding quality: replace USE-Lite (100-dim) with a device-tiered embedding + reranking stack #30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rag): EmbeddingGemma-300M via ONNX + hybrid retrieval, doc-relevance gate, reranker#32

feat(rag): EmbeddingGemma-300M via ONNX + hybrid retrieval, doc-relevance gate, reranker#32
sagar-develop merged 2 commits into
mainfrom
feat/embedding-gemma

sagar-develop commented Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sagar-develop commented Jun 8, 2026

Summary

Embedder upgrade

Hybrid retrieval

Retrieval-quality fixes (the war story)

Testing

Known residual

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant