A local-first Retrieval Augmented Generation (RAG) chatbot designed to answer onboarding and technical questions using internal documentation and code repositories.
This system ingests engineering documentation and source code, indexes them using semantic embeddings (sentence-transformers) + keyword search, and answers questions using a locally hosted LLM (Ollama) with grounded citations.
Currently implemented as a backend API (FastAPI).
- Documentation ingestion (Markdown / text)
- Code repository ingestion (symbol-aware chunking)
- Hybrid search (Vector + Keyword / BM25)
- Code + Docs retrieval fusion (RRF ranking)
- Local embeddings (SentenceTransformers)
- Local LLM answering via Ollama
- Reranking for improved relevance
- Metadata filtering (repo, service, doc space, etc.)
- Incremental indexing (skips unchanged files)
Create virtual environment:
python -m venv venv
source venv/bin/activate
Install packages:
pip install fastapi uvicorn psycopg sentence-transformers
Create database.
Run migration:
psql "$DATABASE_URL" -f migrations/001_init.sql
Required extensions:
- vector
- pg_trgm
Example:
DATABASE_URL=postgresql://postgres:password@localhost:5432/rag_db
EMBEDDING_MODEL_NAME=all-MiniLM-L6-v2
SUMMARIZER_MODE=heuristic
RERANK_MODE=cross_encoder
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=llama3.1:8b
Install:
Run:
ollama serve
ollama pull llama3.1:8b
Start server:
uvicorn app.main:app --reload
Swagger UI available at:
http://localhost:8000/docs
The system supports both single-file and bulk ingestion.
POST /ingest/docs/folder
Example:
{
"folder": "data/docs",
"doc_space": "Engineering Onboarding",
"meta": {
"team": "platform"
}
}POST /ingest/code/repo
Example:
{
"repo_root": "repos/backend",
"repo": "backend-api",
"branch": "main",
"meta": {
"service": "auth"
}
}Retrieve relevant chunks without generating an answer.
POST /search
Example:
{
"query": "Where is JWT verification implemented?"
}Returns retrieved documentation and code snippets.
Main chatbot endpoint.
POST /answer
Example:
{
"query": "How do I deploy the backend service?"
}Response includes:
- grounded answer
- citations
- retrieved sources
The system combines multiple retrieval methods:
Semantic similarity using embeddings.
Postgres full-text search for identifiers and errors.
Reciprocal Rank Fusion (RRF).
Cross-encoder improves final relevance.
Supported filters include:
- repository
- branch
- doc space
- service
- component
- owners
Example:
Only backend repo
Only runbooks
Only auth service
Recommended ingestion setup:
SUMMARIZER_MODE=heuristic
After bulk ingestion:
ANALYZE sources;
ANALYZE chunks;