No embeddings. No vector DB. Pure LLM reasoning.
Traditional Vector RAG
PDF → chunks → embeddings → cosine similarity → hope it's relevant → answer
Vectorless RAG (this project)
PDF → document TREE → LLM reasons over tree → picks exact sections → answer
Step 1 — PARSE
PyMuPDF extracts text page by page.
Font-size analysis detects headings → builds a hierarchical Document Tree.
Step 2 — TREE SEARCH (the key insight)
LLM receives the compact tree (like a Table of Contents with section IDs).
LLM reasons: "Which sections most likely contain the answer?"
Returns a JSON array of section_ids.
→ No similarity math. No embeddings. Just reasoning.
Step 3 — ANSWER
Full text of the chosen sections is retrieved (with page numbers).
LLM synthesises a cited, markdown-formatted answer.
Response is streamed token-by-token to the UI.
- Python 3.10+
- Ollama installed and running
- At least one model pulled:
ollama pull llama3.2 # recommended (fast, good reasoning)
# or
ollama pull mistral
ollama pull phi3
ollama pull gemma2git clone https://github.com/Sdinzsh/RAG.git
cd RAGpip install -r requirements.txtollama serve # if not already running as a servicepython app.pyOpen http://localhost:5000 in your browser.
| Feature | Detail |
|---|---|
| PDF Upload | Drag-and-drop or click to browse (up to 50 MB) |
| Document Tree | Sidebar shows all sections/pages with page numbers |
| Section Highlight | Sections used for each answer are highlighted in the tree |
| Streaming Chat | Answers stream token-by-token like ChatGPT |
| Source Badges | Each answer shows which sections/pages were used |
| Multi-turn Chat | Conversation history maintained per session |
| Model Selector | Switch between any Ollama model at the top |
| Clear History | Reset conversation context without re-uploading |
RAG/
├── app.py ← Flask backend + REST API
├── rag_engine.py ← Core RAG logic
│ ├── build_document_tree() ← PDF → Section tree
│ ├── find_relevant_sections() ← LLM tree search
│ └── answer_query_stream() ← Full RAG pipeline
├── templates/
│ └── index.html ← Web UI (dark industrial theme)
├── uploads/ ← Uploaded PDFs (auto-created)
└── requirements.txt
Edit the top of rag_engine.py:
OLLAMA_BASE = "http://localhost:11434" # Ollama server URL
DEFAULT_MODEL = "llama3.2" # Default model
TOP_K_SECTIONS = 4 # Max sections per query
MAX_SECTION_CHARS = 3000 # Chars per section in context| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Web UI |
GET |
/api/models |
List available Ollama models |
POST |
/api/upload |
Upload PDF + build document tree |
POST |
/api/chat |
Chat query → SSE stream |
GET |
/api/session/<id> |
Session info |
POST |
/api/session/<id>/clear |
Clear chat history |
- Best models for reasoning:
llama3.2,mistral,gemma2,phi3 - Large PDFs: The tree search is O(sections) not O(pages) — works well on big docs
- Structured PDFs (reports, textbooks): heading detection works best
- Scanned PDFs: Won't work — needs OCR pre-processing