PDF Chat Assistant

A Streamlit application that allows you to upload PDF files, process them using LangChain and ChromaDB, and chat with them using OpenAI's GPT models. The app provides source citations showing which page each answer came from.

Features

📄 PDF Upload: Upload and process PDF files
🔍 Document Chunking: Automatically splits PDFs into manageable chunks
💾 Vector Storage: Stores embeddings in ChromaDB for efficient retrieval
💬 Chat Interface: Interactive chat interface to ask questions about your PDFs
📑 Source Citation: Shows which page and section each answer came from
🧠 Memory: Maintains conversation context across multiple questions

Requirements

Python 3.11
OpenAI API key

Installation

Clone this repository or download the files
Install the required packages:

pip install -r requirements.txt

Set up your OpenAI API key:

Create a .env file in the project directory:

OPENAI_API_KEY=your_openai_api_key_here

Or set it as an environment variable:

# On Windows (PowerShell)
$env:OPENAI_API_KEY="your_openai_api_key_here"

# On Linux/Mac
export OPENAI_API_KEY="your_openai_api_key_here"

Usage

Run the Streamlit app:

streamlit run app.py

Open your browser and navigate to the URL shown (typically http://localhost:8501)
Upload a PDF file using the sidebar
Click "Process PDF" to process the document
Start asking questions in the chat interface!

How It Works

PDF Processing: When you upload a PDF, it's loaded and split into chunks using LangChain's RecursiveCharacterTextSplitter
Embedding Generation: Each chunk is embedded using OpenAI's embeddings model
Vector Storage: Embeddings are stored in ChromaDB for fast similarity search
Question Answering: When you ask a question:
- The question is embedded and used to find the most relevant chunks
- The relevant chunks are passed to GPT-3.5-turbo along with the question
- The model generates an answer based on the retrieved context
- Source information (page numbers and snippets) is extracted and displayed

Project Structure

.
├── app.py              # Main Streamlit application
├── requirements.txt    # Python dependencies
├── README.md          # This file
├── .env               # Environment variables (create this)
└── chroma_db/         # ChromaDB database (created automatically)

Notes

The app uses GPT-3.5-turbo for chat. You can modify the model in app.py if needed
ChromaDB data is persisted in the chroma_db/ directory
Each PDF processing creates chunks that are stored and can be queried
The chat maintains context within a session

Troubleshooting

OpenAI API Key Error: Make sure you've set the OPENAI_API_KEY in your .env file or environment variables
PDF Processing Error: Ensure the PDF file is not corrupted or password-protected
Memory Issues: For very large PDFs, you may need to adjust the chunk size in app.py

Deploying to GitHub

See DEPLOY.md for detailed instructions on how to deploy this project to GitHub.

Quick steps:

Install Git from https://git-scm.com/downloads
Initialize repository: git init
Add files: git add .
Commit: git commit -m "Initial commit"
Create repository on GitHub
Push: git push -u origin main

Important: Never commit your .env file with your API key! The .gitignore file is already configured to exclude it.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.streamlit		.streamlit
.env.example		.env.example
.gitignore		.gitignore
QUICK_DEPLOY.md		QUICK_DEPLOY.md
README.md		README.md
STREAMLIT_DEPLOY.md		STREAMLIT_DEPLOY.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Chat Assistant

Features

Requirements

Installation

Usage

How It Works

Project Structure

Notes

Troubleshooting

Deploying to GitHub

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF Chat Assistant

Features

Requirements

Installation

Usage

How It Works

Project Structure

Notes

Troubleshooting

Deploying to GitHub

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages