Implemented a rag based python code, where I can talk to a website + multiple documents.
- scrape a website using crawl4ai
- upload option in ui where I can upload multiple documents
- uploaded documents has to be embeded into chromadb
- I will have a chat interface based on question. LLM has to go through website and all the other documents and generate answer
Learnings: RAG model, VectorDB, Embeddings, LLM
streamlit: For UI and Document uploading
crawl4ai: For Scraping a Website and Crawls and extracts text from a website (follows internal links).
langchain: Core framework for Retrieval-Augmented Generation (RAG)
chromadb: Stores and retrieves text chunks as embeddings (Vector DB)
sentence-transformers: Generates embeddings (vectors) from text locally
transformers: Backend dependency for sentence-transformers & HuggingFace models.
torch: Core machine learning backend for embeddings.
langchain-community: Adds support for community-built integrations, like Ollama
pymupdf: Extracts text from PDF files
Implementation and Overview of RAG

Process of transforming a user's natural language query into a retrieval-friendly format that improves the quality of document or passage retrieval
Dividing the query into multiple queries for better answers

Multi-Query + Result Fusion in rank based manner
- Generation of Queries
- Running all of them through the retriver
- Fusing the retrived results(deduplication + re-ranking)
Decomposing or creating sub-queries to answer user-query

A technique where, instead of answering the user's query directly, the model is first asked to take a step back and generate a higher-level or supporting question - and then use that to guide the retrieval and final answer
It's a clever technique where instead of embedding the user query directly, you first ask the LLM to generate a fake (hypothetical) answer, and then you embed that answer to the retrive supporting documents from the vector database.

to direct the obtained query to the right database(Vector/Relational.. DB)
smartly directing the user queries to the most appropriate retrieval source, LLM behaviour, or toolchain path, based on the intent, type, or domain of the query.
Here, LLMs decide where to send a query based on it's meaning, not just keywords or simple rules
It is a process of transfroming a user's raw input into a better, retrival-optimized query before passing it to your document-retriever.

the process of converting your documents into a searchable, structured representations so they can be efficently retrived during a query.
instead of indexing raw text chunks, you extract and index discrete proposition from the documents.
is an Retrieval Method that allows multiple embeddings per document or chunks, increasing the chance of retrieving relevant context even if the user query is phrased differently.

RAG strategy designed to improve long document retrieval by creating a tree of semantically summarized nodes, enabling hierarchical retrieval for better context relevance and scalability.

It is a dense retrieval technique used in RAG systems to efficently retrive relvant documents from a large corpus using late interaction between query and document embeddings
making the retrieval step in RAG pipline dynamic, query-aware, and context-sensitive rather than static or passive
Technique where the system actively detects, corrects, or refines its own mistakes during the generation process by re-triggerring the retrival step with improved or clarified queries

This is a Retrieval-Augmented Generation (RAG) chatbot app built using Streamlit, capable of:
- Scraping content from websites
- Uploading and processing PDF/TXT files
- Embedding and indexing content into ChromaDB
- Asking questions based on the combined knowledge using an LLM (via Ollama)
utils/
├── data/
│ ├── doc_loader.py # Load PDFs or text documents
│ └── scraper.py # Scrape and parse websites
├── llm/
│ └── llm_generator.py # Prompt and LLM response logic
├── rag/
│ ├── retriever.py # Retrieve top-k docs from Chroma
├── vectorstore/
│ ├── chroma_handler.py # Handler for ChromaDB
│ ├── embeddings.py # Ollama embedding loader
│ └── indexer.py # Document indexing pipeline
app.py # Streamlit app interface
pip install -r requirements.txt- Start Ollama Server (if not already running)
ollama serve
Optionally pull model like:
ollama pull mistral- Run the App
streamlit run app.py🧠 Features ✅ Deep website scraping (via scraper.py)
✅ Document upload and parsing (doc_loader.py)
✅ Embedding using OllamaEmbeddings (e.g., Mistral)
✅ Indexing into ChromaDB
✅ Contextual answering via prompt + OllamaLLM
🔁 Session Behavior On app start or refresh, it:
Deletes previous ChromaDB (./chroma_store)
Clears previously scraped site and uploaded documents
Resets chat history
📌 Notes Make sure Ollama is running and mistral or your model is already pulled.
You can modify model name and embedding dimensions via embeddings.py and llm_generator.py.
- RAG : RAG-Github
- Crawl4AI : crawl4ai


