Skip to content

AdityaKumarSethia/RAGpilot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RAG-pilot

Implemented a rag based python code, where I can talk to a website + multiple documents.

  1. scrape a website using crawl4ai
  2. ⁠upload option in ui where I can upload multiple documents
  3. ⁠uploaded documents has to be embeded into chromadb
  4. ⁠I will have a chat interface based on question. LLM has to go through website and all the other documents and generate answer

Learnings: RAG model, VectorDB, Embeddings, LLM


Tools and Libraries

streamlit : For UI and Document uploading

crawl4ai : For Scraping a Website and Crawls and extracts text from a website (follows internal links).

langchain : Core framework for Retrieval-Augmented Generation (RAG)

chromadb : Stores and retrieves text chunks as embeddings (Vector DB)

sentence-transformers : Generates embeddings (vectors) from text locally

transformers : Backend dependency for sentence-transformers & HuggingFace models.

torch : Core machine learning backend for embeddings.

langchain-community : Adds support for community-built integrations, like Ollama

pymupdf : Extracts text from PDF files


RAG Model

Basic Architecture of RAG RAG Model

Implementation and Overview of RAG Preview


Query Translation

Process of transforming a user's natural language query into a retrieval-friendly format that improves the quality of document or passage retrieval

1. Multi-Query

Dividing the query into multiple queries for better answers Multi Query

2. RAG Fusion

Multi-Query + Result Fusion in rank based manner

  • Generation of Queries
  • Running all of them through the retriver
  • Fusing the retrived results(deduplication + re-ranking)

RAG Fusion

3. Decomposition

Decomposing or creating sub-queries to answer user-query Decompostion

4. Step-back

A technique where, instead of answering the user's query directly, the model is first asked to take a step back and generate a higher-level or supporting question - and then use that to guide the retrieval and final answer

5. HyDE (Hypothetical Document Embeddings)

It's a clever technique where instead of embedding the user query directly, you first ask the LLM to generate a fake (hypothetical) answer, and then you embed that answer to the retrive supporting documents from the vector database. HyDE


Routing

to direct the obtained query to the right database(Vector/Relational.. DB)

Logical Routing

smartly directing the user queries to the most appropriate retrieval source, LLM behaviour, or toolchain path, based on the intent, type, or domain of the query.

Semantic Routing

Here, LLMs decide where to send a query based on it's meaning, not just keywords or simple rules


Query Construction

It is a process of transfroming a user's raw input into a better, retrival-optimized query before passing it to your document-retriever. Query Structure


Indexing

the process of converting your documents into a searchable, structured representations so they can be efficently retrived during a query.

Propositional Indexing

instead of indexing raw text chunks, you extract and index discrete proposition from the documents.

Multi Vector Retriever

is an Retrieval Method that allows multiple embeddings per document or chunks, increasing the chance of retrieving relevant context even if the user query is phrased differently. Multi Vector Retriever

RAPTOR (Retrival Augmented Precise Tree-Oriented Reading)

RAG strategy designed to improve long document retrieval by creating a tree of semantically summarized nodes, enabling hierarchical retrieval for better context relevance and scalability. RAPTOR

ColBERT (Contextualized Late Interaction over BERT)

It is a dense retrieval technique used in RAG systems to efficently retrive relvant documents from a large corpus using late interaction between query and document embeddings

ColBERT


Retrieval

Active RAG

making the retrieval step in RAG pipline dynamic, query-aware, and context-sensitive rather than static or passive

CRAG (Corrective RAG)

Technique where the system actively detects, corrects, or refines its own mistakes during the generation process by re-triggerring the retrival step with improved or clarified queries CRAG


💬 RAGpilot: Website + Document RAG Chatbot

This is a Retrieval-Augmented Generation (RAG) chatbot app built using Streamlit, capable of:

  • Scraping content from websites
  • Uploading and processing PDF/TXT files
  • Embedding and indexing content into ChromaDB
  • Asking questions based on the combined knowledge using an LLM (via Ollama)

📦 Project Structure

utils/
├── data/
│ ├── doc_loader.py # Load PDFs or text documents
│ └── scraper.py # Scrape and parse websites
├── llm/
│ └── llm_generator.py # Prompt and LLM response logic
├── rag/
│ ├── retriever.py # Retrieve top-k docs from Chroma
├── vectorstore/
│ ├── chroma_handler.py # Handler for ChromaDB
│ ├── embeddings.py # Ollama embedding loader
│ └── indexer.py # Document indexing pipeline
app.py # Streamlit app interface


🛠️ Setup

1. Install Requirements

pip install -r requirements.txt
  1. Start Ollama Server (if not already running)
ollama serve
Optionally pull model like:
ollama pull mistral
  1. Run the App
streamlit run app.py

🧠 Features ✅ Deep website scraping (via scraper.py)

✅ Document upload and parsing (doc_loader.py)

✅ Embedding using OllamaEmbeddings (e.g., Mistral)

✅ Indexing into ChromaDB

✅ Contextual answering via prompt + OllamaLLM

🔁 Session Behavior On app start or refresh, it:

Deletes previous ChromaDB (./chroma_store)

Clears previously scraped site and uploaded documents

Resets chat history

📌 Notes Make sure Ollama is running and mistral or your model is already pulled.

You can modify model name and embedding dimensions via embeddings.py and llm_generator.py.


References (Places from where I learned this)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors