Transform your documents into intelligent, searchable knowledge with advanced RAG technology
PaperBrain is a cutting-edge Retrieval-Augmented Generation (RAG) chatbot that allows seamless document uploads and delivers AI-powered, contextually-aware answers. With advanced semantic search and vector embeddings, PaperBrain makes your documents interactive, searchable, and easy to explore.
Demo: https://paperbrain.streamlit.app
- Multi-format document support (PDF, TXT, DOCX)
- Semantic search with state-of-the-art embeddings
- Conversational AI with contextual responses
- Source attribution for every response
- Real-time document processing and instant querying
- Local FAISS vector storage for fast similarity search
- Context preservation with conversation memory
- Semantic text chunking for optimized retrieval
- Streamlit-powered interface for user-friendly interaction
| Component | Technology | Purpose |
|---|---|---|
| LLM | Google Gemini Pro | Human-like responses & reasoning |
| Embeddings | Sentence Transformers (all-MiniLM-L6-v2) | Semantic text encoding |
| Text Processing | LangChain | Document chunking & prompt engineering |
| Component | Technology | Purpose |
|---|---|---|
| Vector DB | FAISS | High-performance nearest neighbor search |
| Web Framework | Streamlit | Interactive web UI |
| Language | Python | Core backend logic |
- Document loaders: PDF, TXT, DOCX
- Intelligent text splitters
- Local vector store management
- Conversation memory management
- Python 3.8+
- Google API Key (Gemini Pro)
# Clone repository
git clone https://github.com/aathifpm/PaperBrain.git
cd PaperBrain
# Install dependencies
pip install -r requirements.txt
# Configure API credentials
echo "GOOGLE_API_KEY=your-gemini-api-key" > .env
# Run the application
streamlit run app.pyUsage Guide
Upload Documents
- Open the sidebar in the Streamlit interface
- Upload or drag-and-drop .pdf, .txt, .docx files
- Wait for processing to complete
Ask Questions
- Use the chatbox to query your documents
- Get AI-generated responses with document references
- Enjoy context-aware conversation flow
Project Structure
PaperBrain/
├── app.py # Main Streamlit app
├── document_processor.py # Handles ingestion & chunking
├── vector_store.py # FAISS vector storage
├── rag_chain.py # RAG pipeline logic
├── requirements.txt # Dependencies
├── .env # API credentials
└── README.md # Documentation
Technical Highlights
RAG Pipeline: Chunking → Embeddings → FAISS Retrieval → LLM
Semantic Embeddings: Sentence Transformers for context-rich queries
FAISS Optimization: High-speed similarity search
LangChain Integration: Prompt orchestration & context injection
Gemini Pro LLM: Generates natural, reasoning-rich responses
Future Enhancements
[ ] Multi-language support
[ ] Support for PPTX, XLSX
[ ] Cloud storage integration (Google Drive, Dropbox)
[ ] User authentication for multi-user access
[ ] Export chat history & summaries
[ ] Document usage analytics
Contributing
Contributions are welcome! Submit issues, ideas, or pull requests to make PaperBrain even better.
📄 License
This project is open source. Refer to the repository for details.
⭐ If you like this project, consider giving it a star on GitHub!