Here's a well-formatted and comprehensive README.md for your RAG (Retrieval-Augmented Generation) PDF Question-Answering app using Streamlit and LangChain:
# 📄 RAG-based PDF Question Answering App
This is a Streamlit web application that uses **Retrieval-Augmented Generation (RAG)** with LangChain to answer user queries based on the contents of a PDF document. The application uses HuggingFace sentence embeddings, FAISS vector store, and Groq’s `gemma2-9b-it` model for generating responses.
---
## 🚀 Features
- Upload and read content from a PDF file.
- Split the PDF content into manageable text chunks.
- Convert text chunks into vector embeddings using HuggingFace Transformers.
- Store and search vector embeddings with FAISS.
- Use Groq's `gemma2-9b-it` model for answering questions.
- Ask natural language questions based on PDF content.
- Simple web interface built with Streamlit.
---
## 🛠️ Tech Stack
| Tool | Purpose |
|-----------------|------------------------------------------|
| Streamlit | Web app UI |
| LangChain | RAG pipeline |
| HuggingFace | Sentence embedding model |
| FAISS | Vector store for semantic search |
| PyPDF2 | PDF text extraction |
| Groq API | LLM backend for answer generation |
---
## 📂 Project Structure
```bash
rag-pdf-app/
│
├── app.py # Main Streamlit application
├── Cheenai_LTT.pdf # Sample PDF (optional)
└── README.md # Project documentationgit clone https://github.com/your-username/rag-pdf-app.git
cd rag-pdf-apppython -m venv venv
.\venv\Scripts\activate # On Windowspip install -r requirements.txtSample requirements.txt:
streamlit
langchain
langchain-community
PyPDF2
faiss-cpu
sentence-transformersReplace this line in app.py with your own key:
groqapi = 'your_groq_api_key'- PDF Upload: Reads content from a local PDF using
PyPDF2. - Text Splitting: Splits content into chunks using LangChain’s
RecursiveCharacterTextSplitter. - Embedding Generation: Uses HuggingFace's
all-MiniLM-L6-v2model to embed chunks. - Vector Store: Chunks are stored in a FAISS index.
- Retriever: Fetches the most relevant chunks based on user queries.
- RAG Prompting: Combines retrieved context with the user question and prompts Groq’s LLM.
- Answer Display: Outputs the generated response in Streamlit.
- Launch the app:
streamlit run app.py-
The app will:
- Automatically load the PDF.
- Display success messages when processing is complete.
- Prompt you to ask a question.
- Return a helpful answer based only on the content of the PDF.
Q: Can I use another PDF?
Yes! Modify the uploaded_file path in the code to use any local PDF.
Q: Do I need GPU or heavy compute? No, the heavy lifting is done by Groq’s cloud-hosted model.
Q: Is it secure?
Keep your groqapi private. Never share your key publicly.