Query your documents locally with AI — no cloud, no API keys.
Index .txt / .md files, retrieve the best chunks with cosine similarity, and answer questions with a local model — all on your machine.
- Fully local — Embeddings and chat go through Ollama on
localhost:11434; your files never leave your disk. - SQLite storage — Pure Go driver (
modernc.org/sqlite), no CGO. Index lives in.rag/data.db. - RAG pipeline — Chunk → embed → store → embed question → top‑k retrieval → streamed answer.
- Streaming answers — Replies stream token-by-token for a responsive CLI experience.
- Go 1.26+ (see
go.mod). - Ollama installed and running (
ollama serveis usually started automatically). - Models pulled (defaults below):
ollama pull nomic-embed-text
ollama pull llama3.2:3bgo install github.com/srijxnnn/localrag/cmd/rag@latestEnsure $(go env GOPATH)/bin is on your PATH.
Or clone and run from source:
git clone https://github.com/srijxnnn/localrag.git
cd localrag
go run ./cmd/rag --help# 1. Create a workspace (creates .rag/ and the database)
rag init
# 2. Index a file (supports .txt and .md)
rag add ./notes.md
# 3. Ask a question grounded in your docs
rag ask "What are the main points in this document?"Override the chat model when you ask:
rag ask "Summarize the refund section" --model mistral| Command | Description |
|---|---|
rag init |
Create .rag/ in the current directory and initialize the SQLite database. Fails if already initialized. |
rag add <path> |
Read a single .txt or .md file, chunk it, embed with nomic-embed-text, and store vectors. |
rag ask <question> |
Embed the question, pick the top 3 chunks by cosine similarity, then stream an answer (default model: llama3.2:3b). |
rag list |
Show indexed paths, chunk counts per file, and when each was added (RFC3339). |
Global behavior:
- Run commands from the same directory as your
.ragworkspace (or ensure the process cwd matches where you ranrag init). - If Ollama is down, embedding fails with a hint to run
ollama serve. If a model is missing, you’ll get aollama pull …suggestion.
flowchart LR
subgraph index["Indexing"]
F[File .txt/.md] --> C[Chunk 500 / overlap 100]
C --> E[Embed nomic-embed-text]
E --> DB[(SQLite .rag/data.db)]
end
subgraph query["Question"]
Q[Question] --> EQ[Embed same model]
EQ --> TOP[Top-K cosine similarity]
DB --> TOP
TOP --> P[RAG prompt + context]
P --> G[Generate stream llama3.2:3b]
end
Defaults (see internal/chunk and internal/cli):
| Setting | Value |
|---|---|
| Embedding model | nomic-embed-text |
| Chat model | llama3.2:3b (rag ask --model to override) |
| Chunk size | 500 characters |
| Overlap | 100 characters |
| Top K | 3 |
| Ollama URL | http://localhost:11434 |
cmd/rag/ # CLI entrypoint
internal/cli/ # Cobra commands (init, add, ask, list)
internal/chunk/ # Text chunking for .txt / .md
internal/ollama/ # HTTP client: embeddings + streaming generate
internal/store/ # SQLite schema and persistence
internal/search/ # Cosine similarity + TopK
- Only
.txtand.mdfiles are supported forrag addtoday. - Retrieval is brute-force TopK over all chunks — fine for small corpora; larger indexes may need a dedicated vector DB or ANN later.
Contributions and issues are welcome.