PaperBrain - AI-Powered Document Intelligence

Transform your documents into intelligent, searchable knowledge with advanced RAG technology

PaperBrain is a cutting-edge Retrieval-Augmented Generation (RAG) chatbot that allows seamless document uploads and delivers AI-powered, contextually-aware answers. With advanced semantic search and vector embeddings, PaperBrain makes your documents interactive, searchable, and easy to explore.

Demo: https://paperbrain.streamlit.app

Highlights

Core Features

Multi-format document support (PDF, TXT, DOCX)
Semantic search with state-of-the-art embeddings
Conversational AI with contextual responses
Source attribution for every response
Real-time document processing and instant querying

Advanced Capabilities

Local FAISS vector storage for fast similarity search
Context preservation with conversation memory
Semantic text chunking for optimized retrieval
Streamlit-powered interface for user-friendly interaction

Technology Stack

AI & Machine Learning

Component	Technology	Purpose
LLM	Google Gemini Pro	Human-like responses & reasoning
Embeddings	Sentence Transformers (all-MiniLM-L6-v2)	Semantic text encoding
Text Processing	LangChain	Document chunking & prompt engineering

Backend & Storage

Component	Technology	Purpose
Vector DB	FAISS	High-performance nearest neighbor search
Web Framework	Streamlit	Interactive web UI
Language	Python	Core backend logic

Supporting Tools

Document loaders: PDF, TXT, DOCX
Intelligent text splitters
Local vector store management
Conversation memory management

Quick Start

Prerequisites

Python 3.8+
Google API Key (Gemini Pro)

Installation

# Clone repository
git clone https://github.com/aathifpm/PaperBrain.git
cd PaperBrain

# Install dependencies
pip install -r requirements.txt

# Configure API credentials
echo "GOOGLE_API_KEY=your-gemini-api-key" > .env

# Run the application
streamlit run app.py

Usage Guide

Upload Documents

Open the sidebar in the Streamlit interface
Upload or drag-and-drop .pdf, .txt, .docx files
Wait for processing to complete

Ask Questions

Use the chatbox to query your documents
Get AI-generated responses with document references
Enjoy context-aware conversation flow

Project Structure

PaperBrain/
├── app.py                  # Main Streamlit app
├── document_processor.py   # Handles ingestion & chunking
├── vector_store.py         # FAISS vector storage
├── rag_chain.py            # RAG pipeline logic
├── requirements.txt        # Dependencies
├── .env                    # API credentials
└── README.md               # Documentation

Technical Highlights

RAG Pipeline: Chunking → Embeddings → FAISS Retrieval → LLM
Semantic Embeddings: Sentence Transformers for context-rich queries
FAISS Optimization: High-speed similarity search
LangChain Integration: Prompt orchestration & context injection
Gemini Pro LLM: Generates natural, reasoning-rich responses

Future Enhancements

[ ] Multi-language support
[ ] Support for PPTX, XLSX
[ ] Cloud storage integration (Google Drive, Dropbox)
[ ] User authentication for multi-user access
[ ] Export chat history & summaries
[ ] Document usage analytics

Contributing

Contributions are welcome! Submit issues, ideas, or pull requests to make PaperBrain even better.

📄 License

This project is open source. Refer to the repository for details.

⭐ If you like this project, consider giving it a star on GitHub!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PaperBrain - AI-Powered Document Intelligence

Highlights

Core Features

Advanced Capabilities

Technology Stack

AI & Machine Learning

Backend & Storage

Supporting Tools

Quick Start

Prerequisites

Installation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
config.py		config.py
document_processor.py		document_processor.py
rag_chain.py		rag_chain.py
requirements.txt		requirements.txt
vector_store.py		vector_store.py

Folders and files

Latest commit

History

Repository files navigation

PaperBrain - AI-Powered Document Intelligence

Highlights

Core Features

Advanced Capabilities

Technology Stack

AI & Machine Learning

Backend & Storage

Supporting Tools

Quick Start

Prerequisites

Installation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages