From f213f24fcbf8becc5a86422ad7a0cac34ae12720 Mon Sep 17 00:00:00 2001 From: sagar-develop Date: Tue, 9 Jun 2026 01:37:24 +0530 Subject: [PATCH] docs: architecture diagrams (Mermaid) + Gource growth recipe MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add docs/ARCHITECTURE.md — engine/product split, engine internals, the RAG ingestion+retrieval pipeline (hybrid vector+BM25+RRF, document relevance gate + title-match override, stateful-KV grounding), and the device-tiered embedder recommendation — all as GitHub-rendered Mermaid. Add docs/gource.md — NativeLM-branded recipe for an animated repo-growth clip. Co-Authored-By: Claude Opus 4.8 --- docs/ARCHITECTURE.md | 206 +++++++++++++++++++++++++++++++++++++++++++ docs/gource.md | 65 ++++++++++++++ 2 files changed, 271 insertions(+) create mode 100644 docs/ARCHITECTURE.md create mode 100644 docs/gource.md diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md new file mode 100644 index 0000000..5a1367c --- /dev/null +++ b/docs/ARCHITECTURE.md @@ -0,0 +1,206 @@ +# Architecture + +NativeLM is an on-device document-chat app built on **litertlm-kmp**, a Kotlin +Multiplatform engine that wraps Google's LiteRT-LM. Everything — the language model, +the embedder, the vector index, OCR, speech-to-text — runs locally. No account, no +upload, no telemetry. This document explains how the pieces fit together and how the +codebase is organised so the boundary between the reusable **engine** and the +**product** stays clean as it grows. + +Two Gradle modules: + +- **`:lib`** — the engine (`com.sagar.aicore`). Dual-licensed (AGPL-3.0 / commercial). + Pure Kotlin Multiplatform: `commonMain` holds platform-neutral contracts and + orchestration; `androidMain` holds the Android-backed inference implementations; + `iosMain` carries the iOS roadmap surface. +- **`:sample-app`** — the NativeLM product (`com.nativelm.app`). Android + Compose. It + supplies the platform-backed stores (ObjectBox, DataStore, SAF, ML Kit OCR) and the + user-facing experience, and depends on `:lib` — never the other way around. + +```mermaid +flowchart TB + subgraph product["sample-app · NativeLM (com.nativelm.app)"] + ui["Compose UI
chat · documents · models · settings · studio · sync · lock"] + vm["NativeLmViewModel"] + holders["EngineHolder · RagHolder
NativeLmModelCatalog · EmbedderRecommendation"] + platform["Android platform glue
ObjectBoxDocumentRepository (HNSW)
AndroidTextExtractor + MlKitOcrEngine
AppPreferences (DataStore) · SecureStore"] + end + + subgraph engine[":lib · litertlm-kmp engine (com.sagar.aicore)"] + contracts["Contracts (commonMain)
LocalAiEngine · EmbeddingEngine · Reranker
DocumentIngestor · DocumentRetriever · DocumentStore
ModelCatalog · ModelManager"] + impls["Android impls (androidMain)
LiteRtLmLocalAiEngine (Gemma)
OnnxEmbeddingEngine · OnnxReranker
GemmaBpeTokenizer · BertWordPieceTokenizer"] + end + + ui --> vm --> holders --> contracts + holders --> platform + platform -. implements .-> contracts + contracts --- impls + + classDef p fill:#eef6ee,stroke:#7FA980,color:#1C1B1A; + classDef e fill:#f5f3ef,stroke:#9a8f7a,color:#1C1B1A; + class ui,vm,holders,platform p; + class contracts,impls e; +``` + +The key architectural rule: **the product talks to the engine only through contracts** +(`LocalAiEngine`, `EmbeddingEngine`, `DocumentRetriever`, `DocumentStore`, …). The +product *provides* the storage implementations (e.g. `ObjectBoxDocumentRepository` +implements the engine's `DocumentStore`) but never reaches into engine internals. That +inversion is what lets the same engine power a second app (a kids' learning app, Curio) +through a Gradle composite build. + +--- + +## Engine internals (`:lib`) + +The engine is organised around small, swappable contracts in `commonMain`, each with an +Android implementation in `androidMain`. Inference backends are deliberately +**telemetry-free**: the LLM runs on LiteRT-LM (CPU), and the embedder/reranker run on +**ONNX Runtime** (Microsoft, no Google/Play dependency) rather than MediaPipe — a +conscious choice to protect the zero-telemetry promise. + +```mermaid +flowchart LR + subgraph common["commonMain — contracts & orchestration"] + lae["LocalAiEngine
(chat, stateful KV session)"] + ee["EmbeddingEngine
(task-aware: QUERY / DOCUMENT)"] + rr["Reranker
(cross-encoder, optional)"] + ing["DocumentIngestor"] + ret["DocumentRetriever"] + store["DocumentStore"] + cat["ModelCatalog · ModelManager
ModelDescriptor · CompanionFile"] + rag["RAG support
TextChunker · KeywordSearch (BM25+RRF)
RagConfig · RagContextFormatter"] + end + + subgraph android["androidMain — inference backends"] + litert["LiteRtLmLocalAiEngine
Gemma via LiteRT-LM (CPU)"] + onnxE["OnnxEmbeddingEngine
EmbeddingGemma-300M (ONNX)"] + useE["MediaPipeEmbeddingEngine
USE-Lite 100-dim (entry tier)"] + onnxR["OnnxReranker
ms-marco MiniLM-L6 (ONNX)"] + tok["GemmaBpeTokenizer · BertWordPieceTokenizer
(pure-Kotlin, validated vs HF)"] + end + + lae -. impl .-> litert + ee -. impl .-> onnxE + ee -. impl .-> useE + rr -. impl .-> onnxR + onnxE --> tok + onnxR --> tok + ing --> ee + ing --> store + ret --> ee + ret --> rr + ret --> store + ret --> rag + + classDef c fill:#f5f3ef,stroke:#9a8f7a,color:#1C1B1A; + classDef a fill:#eef2f6,stroke:#6a86a8,color:#1C1B1A; + class lae,ee,rr,ing,ret,store,cat,rag c; + class litert,onnxE,useE,onnxR,tok a; +``` + +Beyond core inference, the engine also hosts: **Studio** (`studio/` — generating +artifacts like mind maps, timelines, podcasts from documents), **Sync** (`sync/` — P2P +device-to-device transfer over NSD/mDNS + TCP, GMS-free), **Backup** (`backup/` — +passphrase-encrypted `.nlmbak` export, Argon2id + AES-256-GCM), and **Chart** +(`chart/`). Speech-to-text (`SpeechToText`) is wired to on-device Whisper in the app. + +--- + +## The RAG pipeline + +This is the heart of the product: grounding answers in the user's own documents with +citations. There are two phases — **ingestion** (when a document is imported) and +**retrieval** (when a question is asked). + +```mermaid +flowchart TB + subgraph ingest["Ingestion — on import"] + i1["PDF / image / text"] + i2["AndroidTextExtractor
(+ MlKitOcrEngine for scans)"] + i3["TextChunker
(≈500 chars, 50 overlap)"] + i4["EmbeddingEngine.embed(text, DOCUMENT)
EmbeddingGemma → Matryoshka dim"] + i5["ObjectBox HNSW
(per-dim entity: 100/128/256/512)"] + i1 --> i2 --> i3 --> i4 --> i5 + end + + subgraph retrieve["Retrieval — on each question"] + q0["User question"] + q1["EmbeddingEngine.embed(query, QUERY)"] + qV["Vector arm
HNSW k-NN, distance-gated"] + qK["Keyword arm
BM25 over term-matching chunks"] + gate["Document relevance gate
dominance (best doc + ties)
+ title-match override"] + fuse["Reciprocal Rank Fusion
+ per-document cap"] + rerankStep["Reranker (≥8 GB tiers)
cross-encoder re-score top pool"] + topk["Top-k chunks → grounding block
(RagContextFormatter, size-capped)"] + llm["LocalAiEngine
(stateful KV; grounding re-flushed per turn)"] + ans["Answer + citations"] + + q0 --> q1 --> qV + q0 --> qK + qV --> gate + qK --> gate + gate --> fuse --> rerankStep --> topk --> llm --> ans + end + + i5 -. queried by .-> qV + i5 -. queried by .-> qK + + classDef ing fill:#eef6ee,stroke:#7FA980,color:#1C1B1A; + classDef ret fill:#f5f3ef,stroke:#9a8f7a,color:#1C1B1A; + class i1,i2,i3,i4,i5 ing; + class q0,q1,qV,qK,gate,fuse,rerankStep,topk,llm,ans ret; +``` + +A few design decisions worth calling out, because they came from real failure modes +(see [`_session/material/blog-embedding-enhancements.md`](../_session/material/blog-embedding-enhancements.md)): + +- **Hybrid retrieval.** The vector arm finds semantic matches; the BM25 keyword arm + recovers exact strings (names, IDs, codenames) that a small embedder ranks poorly. The + two rankings merge with Reciprocal Rank Fusion. +- **Document relevance gate.** With several similar documents (e.g. a car, a life, and a + health insurance policy in one project), lexical overlap on words like + "insurance"/"premium" used to let an answer ground on the *wrong* document. The gate + keeps only the document(s) the vector arm clearly favours, and a **title-match + override** lets a query that names a document by its title ("car" → a *CarPolicy* + source) ground on that document over a higher-scoring but wrong one. +- **Stateful KV, flushed grounding.** The chat session keeps a warm KV cache for flat + time-to-first-token. But injecting a fresh grounding block every turn would accumulate + in that cache and eventually overflow the on-device context window — so grounded turns + re-prefill only the bounded visible transcript, flushing stale grounding. + +--- + +## Device-tiered model selection + +On-device inference must fit the phone. `EmbedderRecommendation.forDevice(ramMb)` mirrors +the LLM tiering and picks the embedder, the Matryoshka dimension, and whether to run the +reranker — keyed on effective RAM (after the OEM RAM-expansion cap). One downloaded +EmbeddingGemma model is truncated per tier; entry devices stay on the no-download, +ungated USE-Lite. + +```mermaid +flowchart LR + ram{"effective RAM"} + ram -->|"≥ 10 GB"| t4["EmbeddingGemma @512
+ reranker"] + ram -->|"8–10 GB"| t3["EmbeddingGemma @256
+ reranker"] + ram -->|"6–8 GB"| t2["EmbeddingGemma @256"] + ram -->|"< 6 GB"| t1["USE-Lite @100
(no download, ungated)"] + + classDef n fill:#f5f3ef,stroke:#9a8f7a,color:#1C1B1A; + class t1,t2,t3,t4 n; +``` + +The same recommendation surfaces in the Models screen as a *Recommended* badge, and the +download flow pulls the model plus its companions (the ONNX external-data weights blob +and the tokenizer) on-device — gated models reuse the Hugging Face token flow. + +--- + +## Visualising growth + +This file is the intentional, reviewed view of the architecture — kept in `docs/` so it +evolves alongside the code (transparent-dev model). For the *organic* view of how the +codebase grew over time, the repository history can be rendered with +[Gource](https://gource.io/) (an animated, file-by-file visualisation of the git log). +See [`docs/gource.md`](gource.md) for the recipe used to produce the growth clip. diff --git a/docs/gource.md b/docs/gource.md new file mode 100644 index 0000000..0bcbf28 --- /dev/null +++ b/docs/gource.md @@ -0,0 +1,65 @@ +# Growth visualisation with Gource + +[Gource](https://gource.io/) renders an animated, file-by-file visualisation of a git +repository's history — a "watch the codebase grow" clip. It's a nice companion to +[`ARCHITECTURE.md`](ARCHITECTURE.md): that file is the *intentional* structure, this is +the *organic* growth over time. Handy for launch posts and talks. + +## Install (Windows) + +```powershell +winget install Acaceia.Gource +``` + +(ffmpeg is also required for video output — already present in this environment. On a +clean machine: `winget install Gyan.FFmpeg`.) + +Gource needs an OpenGL context, so run it on a desktop session (not a headless shell). + +## Produce the clip (NativeLM-branded) + +Run from the repository root. The colours match the NativeLM palette — warm-dark canvas +`#1C1B1A`, off-white text `#FAF9F6`, sage-green directories `#7FA980`. + +```powershell +gource . ` + --title "NativeLM — on-device document chat" ` + --seconds-per-day 0.5 ` + --auto-skip-seconds 1 ` + --max-file-lag 0.1 ` + --hide mouse,filenames,progress ` + --highlight-users ` + --background-colour 1C1B1A ` + --font-colour FAF9F6 ` + --dir-colour 7FA980 ` + --highlight-colour 7FA980 ` + --key ` + --1280x720 ` + --output-framerate 30 ` + --output-ppm-stream - ` + | ffmpeg -y -r 30 -f image2pipe -vcodec ppm -i - ` + -vcodec libx264 -preset slow -pix_fmt yuv420p -crf 20 ` + _session/material/nativelm-growth.mp4 +``` + +A short, fast-paced clip (low `--seconds-per-day`) reads best on LinkedIn / X. For a +longer narrated walkthrough, raise `--seconds-per-day` to ~3–5. + +## Focus on the source (optional) + +To exclude generated/vendor noise (build outputs, ObjectBox-generated files, session +material) and visualise only hand-written source, drive Gource from a filtered log: + +```powershell +git log --pretty=format:user:%aN%n%ct --reverse --raw --encoding=UTF-8 ` + --no-renames -- lib/src sample-app/src docs ` + > gource.log +gource gource.log --title "NativeLM" ... # same flags as above +``` + +## Notes + +- The output MP4 goes to `_session/material/` (content/marketing material), which is + git-ignored — the clip is an artifact, not part of the repo. +- To put faces on contributors, drop avatar PNGs (named per git author) in a folder and + add `--user-image-dir `.