A simple Retrieval-Augmented Generation (RAG) system built with LangChain, Ollama, and PyPDF. This system allows users to ask questions about PDF documents and receive AI-generated answers based on the document content.
- PDF document loading and processing
- Document chunking for efficient retrieval
- Vector embeddings using Ollama's BGE-M3 model
- Question answering using Llama 3.2 Vision (11B parameters)
- Interactive command-line interface
- Source attribution for answers
- Python 3.12 or higher
- Ollama installed and running locally
- Required models pulled in Ollama:
bge-m3for embeddingsllama3.2-vision:11bfor text generation
- Clone this repository:
git clone https://github.com/WytheHuang/langchain-simple-RAG.git
cd langchain-simple-RAG- Set up a Python virtual and activate it:
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate- Install the required packages:
uv sync- Create a
documentsdirectory in your project root and place your PDF files there:
mkdir documents
cp /path/to/your/pdfs/*.pdf documents/- Run the main script:
python main.py- Enter your questions when prompted. Type
/quitto exit the program.
Example interaction:
Loading documents...
Loaded 25 document chunks
Enter your question (or '/quit' to exit): What is machine learning?
Answer: [AI-generated answer will appear here]
Sources:
Source: documents/example.pdf
Page: 1
Content: [Relevant excerpt from the document]
langchain-simple-RAG/
├── documents/ # Directory for PDF files
├── main.py # Main application script
├── README.md # This file
├── pyproject.toml # Project configuration
└── uv.lock # Dependencies lock file
-
Document Loading: The system loads PDF documents from the
documentsdirectory usingPyPDFLoader. -
Document Splitting: Documents are split into smaller chunks using
RecursiveCharacterTextSplitterfor efficient processing. -
Embedding: Document chunks are embedded using Ollama's BGE-M3 model and stored in an in-memory vector store.
-
Question Answering: When a user asks a question:
- The system retrieves relevant document chunks
- Passes them to the Llama 3.2 model
- Generates a contextual answer
- Provides source attribution
The system uses the following default settings:
- Chunk size: 1000 characters
- Chunk overlap: 200 characters
- Embedding model:
bge-m3 - LLM model:
llama3.2-vision:11b
To contribute to this project:
- Install development dependencies:
uv sync --group lint- Run linters:
ruff check .
black .APACHE-2.0 License See the LICENSE file for details.