A modular RAG-powered document assistant for AI-powered question answering. Supports CLI and Streamlit interfaces with Ollama, LangChain, and FAISS vector search.
- Multiple Interfaces: Command-line and web-based (Streamlit) interfaces
- Flexible Document Loading: Supports PDF, TXT, and MD file formats
- Configurable Retrieval: Basic and multi-query retrieval strategies
- Vector Store Management: Automatic save/load of FAISS vector databases
- Runtime Configuration: Switch between modes and parameters at runtime
- Modular Architecture: Clean separation of concerns for easy maintenance
- Python 3.9 or higher
- Ollama with at least one model (e.g.,
llama3.2:3b) - Ollama embedding model (e.g.,
nomic-embed-text:v1.5)
- Clone the repository:
git clone <repository-url>
cd rag_document_assistant- Install Python dependencies:
pip install -r requirements.txt- Install and start Ollama:
# Install Ollama (if not already installed)
curl https://ollama.ai/install.sh | sh
# Pull required models
ollama pull llama3.2:3b
ollama pull nomic-embed-text:v1.5- Ensure Ollama is running:
ollama serverag_document_assistant/
├── __init__.py # Package initialization
├── main.py # Main entry point with mode switching
├── requirements.txt # Python dependencies
├── pytest.ini # Pytest configuration
├── LICENSE # MIT License
├── .gitignore # Git ignore rules
├── .env.example # Example environment variables
├── README.md # This file
├── AGENTS.md # Architecture documentation
├── TEST_REPORT.md # Testing documentation
├── input_data/ # Input directory for documents
├── output_data/ # Output directory for vector stores
├── src/ # Source code
│ ├── __init__.py
│ ├── config/
│ │ ├── __init__.py
│ │ └── settings.py # Configuration management
│ ├── ingestion/
│ │ ├── __init__.py
│ │ └── loader.py # Document loading utilities
│ ├── processing/
│ │ ├── __init__.py
│ │ └── splitter.py # Text splitting
│ ├── vector_store/
│ │ ├── __init__.py
│ │ └── vector_store.py # Vector database management
│ ├── retrieval/
│ │ ├── __init__.py
│ │ ├── retriever.py # Retriever factory
│ │ └── rag_chain.py # RAG chain factory
│ └── interfaces/
│ ├── __init__.py
│ ├── cli.py # CLI interface
│ └── streamlit_app.py # Streamlit web interface
└── tests/ # Test suite
├── __init__.py
├── conftest.py # Shared fixtures and test utilities
├── fixtures/ # Test fixtures directory
├── unit/ # Unit tests
│ ├── __init__.py
│ ├── test_config.py
│ ├── test_ingestion.py
│ ├── test_processing.py
│ ├── test_vector_store.py
│ └── test_retrieval.py
└── integration/ # Integration tests
├── __init__.py
└── test_basic_integration.py
Run the application in CLI mode (default):
python main.py --mode cliInteractive mode (default):
```bash
python main.py --mode cli --cli-mode interactiveSingle query mode:
python main.py --mode cli --cli-mode query --question "Your question here"CLI Options:
--cli-mode {interactive,query}: Operation mode--question TEXT: Question to ask (for query mode)--input-dir PATH: Input directory for documents--output-dir PATH: Output directory for vector store--llm-model MODEL: LLM model name (default: llama3.2:3b)--embedding-model MODEL: Embedding model name (default: nomic-embed-text:v1.5)--retrieval-mode {basic,multi_query}: Retrieval strategy--chain-mode {basic,conversational}: Chain type--force-recreate: Force recreation of vector store--log-level {DEBUG,INFO,WARNING,ERROR}: Logging level
Run the application in Streamlit mode:
python main.py --mode streamlitThis will launch a web interface at http://localhost:8501
Create a .env file in the project root to override defaults:
INPUT_DATA_DIR=input_data
OUTPUT_DATA_DIR=output_data
LLM_MODEL=llama3.2:3b
EMBEDDING_MODEL=nomic-embed-text:v1.5
OLLAMA_BASE_URL=http://localhost:11434
RETRIEVAL_MODE=multi_query
CHUNK_SIZE=1200
CHUNK_OVERLAP=300
LOG_LEVEL=INFOBoth CLI and Streamlit modes allow runtime configuration:
- Input/Output Directories: Specify custom paths for documents and vector stores
- Model Selection: Choose different Ollama models for LLM and embeddings
- Retrieval Mode: Switch between basic and multi-query retrieval
- Chain Mode: Choose between basic and conversational RAG chains
- Chunk Settings: Adjust chunk size and overlap
- Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtThe project includes a comprehensive test suite with 78 tests covering all core modules.
Run all tests:
pytest tests/ -vRun unit tests only:
pytest tests/unit/ -vRun integration tests:
pytest tests/integration/ -vRun with coverage:
pytest tests/ --cov=src --cov-report=html
# View coverage: open htmlcov/index.html- Comprehensive unit and integration tests included
- Code coverage reporting available
See TEST_REPORT.md for detailed test documentation.
Format code:
black src/ tests/Lint code:
ruff check src/ tests/Type checking:
mypy src/See AGENTS.md for detailed architecture documentation including module descriptions and design patterns.
This application is designed to be safe for public repositories:
- No hardcoded secrets or credentials
- All sensitive data is externalized to environment variables
.gitignoreexcludes vector stores, logs, and configuration files
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Ensure Ollama is running:
ollama serve - Check the base URL in settings (default: http://localhost:11434)
- Delete the vector store and recreate:
--force-recreate - Check disk space availability
- Reduce chunk size in settings
- Use smaller models (e.g.,
llama3.2:1binstead ofllama3.2:3b)