RAG PDF Chat - Python, React, Tailwind CSS, FastAPI, SSE Streaming, Multi-Agent Pipeline, Text Chunking, Conversion History, Device-Local Data, Anonymous Sessions FullStack Project (Contextual Document Assistant)

A production-style, educational full-stack RAG project that demonstrates how to turn PDF documents into searchable knowledge and chat with them using modern AI models. It is designed for learners and builders who want to understand document chunking, embeddings, vector search, SSE streaming responses, multi-provider model fallback, and practical deployment (Vercel + Coolify VPS) end to end.

Frontend Live Demo: https://pdf-chat-scrapper.vercel.app/
Backend Live Demo: https://rag-pdf-backend.arnobmahmud.com/

Project overview
What you will learn
Keywords and glossary (beginner-friendly)
Architecture walkthrough
Tech stack and dependencies
Project structure and file walkthrough
Core features and how they work
API reference
Environment variables (.env) explained
How to run locally
How to deploy (Vercel + Coolify VPS)
How to reuse this project in your own apps
Quality checks and scripts
Troubleshooting notes
Contributing
License

Project overview

This app lets a user upload a PDF and ask questions about it. The backend parses PDF text, splits it into chunks, embeds each chunk into vectors, stores vectors in FAISS, retrieves relevant context for each question, then sends that context to an LLM for grounded responses.

It also includes:

Anonymous session isolation (per browser via session header)
Streaming answers (SSE) and non-streaming mode
Model selector with provider fallback
Optional source snippets
Rate limiting
Device-local saved chat history in IndexedDB
Deployment-ready Docker/Coolify setup

What you will learn

How RAG (Retrieval Augmented Generation) works in a practical, production-like app.
How to build a TypeScript React frontend that calls a FastAPI backend.
How to wire PDF upload, chunking, embeddings, and vector search.
How to stream model output token-by-token over SSE.
How to maintain per-browser isolation without user authentication.
How to deploy frontend and backend separately with correct CORS and environment config.

Keywords and glossary (beginner-friendly)

Term	Meaning
RAG	Retrieve relevant document context first, then generate answer with LLM.
Embedding	Numeric vector representation of text meaning.
FAISS	Fast vector database/index for similarity search.
Chunking	Splitting long PDF text into smaller pieces for retrieval.
SSE	Server-Sent Events for live streaming answer text.
Session ID	Unique browser identifier used to isolate each user’s PDF vector index.
LRU eviction	Removes least-recently-used session indexes when cap is reached.
CORS	Browser security rule controlling which frontend origins can call backend APIs.

Architecture walkthrough

React SPA (frontend)
  ├─ localStorage: anonymous session UUID (X-Chat-Session-Id)
  ├─ IndexedDB: saved chat history by PDF
  └─ Calls FastAPI endpoints (/upload, /ask, /ask/stream, /status, /models)

FastAPI backend
  ├─ PDF loader + text splitter
  ├─ Embedding service + FAISS vector store
  ├─ Agent pipeline (retrieve -> optimize -> answer -> validate)
  ├─ Optional source snippets
  ├─ Rate limiting and session cleanup
  └─ Optional Sentry tunnel (/api/oversight)

Tech stack and dependencies

Frontend

React 18 + TypeScript
Vite
Tailwind CSS
Framer Motion
React Router
Radix UI primitives
Sonner toast notifications
Sentry browser SDK (optional)

Backend

FastAPI + Uvicorn
Pydantic + pydantic-settings
LangChain ecosystem
FAISS CPU
sentence-transformers (local embedding fallback)
httpx / aiohttp
Tenacity retries

Why this stack is useful for learning

It separates UI concerns from AI/backend concerns cleanly.
It demonstrates real deployment constraints (CORS, env vars, reverse proxy).
It includes robust failover behavior and operational safety defaults.

Project structure and file walkthrough

rag-pdf-chat/
├── README.md
├── docs/                            # deployment and operational guides
├── frontend/
│   ├── index.html
│   ├── package.json
│   ├── vite.config.ts
│   ├── src/
│   │   ├── main.tsx                # app bootstrap
│   │   ├── App.tsx                 # routes and app-level providers
│   │   ├── pages/                  # home, chat, about, api-status
│   │   ├── components/
│   │   │   ├── chat/               # chat container, model selector, upload, input
│   │   │   ├── layout/             # header/footer/layout helpers
│   │   │   ├── sections/           # marketing/documentation sections
│   │   │   └── ui/                 # reusable UI primitives
│   │   ├── hooks/                  # data and behavior hooks
│   │   ├── lib/                    # api/env/storage/session logic
│   │   └── types/                  # shared TS types
│   └── public/
└── backend/
    ├── app/
    │   ├── main.py                 # app setup and middleware
    │   ├── config.py               # settings/env/provider config
    │   ├── routes/                 # health, upload, chat, oversight
    │   ├── services/               # vector store, rate limiting, cleanup
    │   └── agents/                 # multi-step answer pipeline
    ├── requirements.txt
    ├── requirements-dev.txt
    ├── .env.example
    ├── Dockerfile
    └── .dockerignore

Core features and how they work

1) PDF upload and indexing

User uploads a PDF through the frontend. Backend:

extracts text
chunks it
embeds each chunk
stores vectors in FAISS under session-specific folder

2) Chat with streaming or non-streaming

Streaming on -> uses SSE (/ask/stream) for live token output.
Streaming off -> classic JSON response (/ask).

3) Source snippets toggle

When enabled, backend returns source context snippets (if available).
Helps explain where the answer came from.

4) Multi-model and fallback behavior

Frontend can select a preferred model.
Backend tries configured providers and can fall back when a provider fails or is over quota.

5) Session isolation and local history

Browser keeps anonymous session UUID.
Backend uses X-Chat-Session-Id to separate vector indexes per browser.
Frontend stores transcript locally in IndexedDB per PDF.

6) Rate limits and cleanup

Per-IP request limits for upload and ask routes.
Startup cleanup removes stale session FAISS folders.

API reference

Most data routes require X-Chat-Session-Id header.

Method	Endpoint	Purpose
`GET`	`/`	Basic backend status
`GET`	`/health`	Health check
`GET`	`/models`	Available models/providers
`GET`	`/pipeline-info`	Explains pipeline stages
`GET`	`/status`	Session PDF loaded status
`POST`	`/upload`	Upload PDF and build index
`POST`	`/ask`	Ask question (non-streaming JSON)
`POST`	`/ask/stream`	Ask question (SSE streaming)
`POST`	`/api/oversight`	Sentry tunnel endpoint

Example request

curl -X POST "http://127.0.0.1:8000/ask" \
  -H "Content-Type: application/json" \
  -H "X-Chat-Session-Id: 11111111-2222-4333-8444-555555555555" \
  -d '{"question":"Summarize this PDF","model":"openai/gpt-4o-mini","include_sources":true}'

Environment variables (`.env`) explained

This project does need backend environment variables for real AI usage.

Backend (`backend/.env`)

Create from template:

cd backend
cp .env.example .env

Minimum required

OPENROUTER_API_KEY=your_openrouter_key
OPENROUTER_API_BASE=https://openrouter.ai/api/v1

Commonly used variables

Variable	Required	Purpose
`OPENROUTER_API_KEY`	Yes	Main provider key
`OPENROUTER_API_BASE`	Yes	OpenRouter base URL
`DEFAULT_MODEL`	Recommended	Default model ID
`DEFAULT_PROVIDER`	Recommended	Provider selection hint
`CORS_ORIGINS`	Yes for deployment	Allowed frontend origins
`FAISS_PERSIST_DIR`	Recommended	Vector index directory
`MAX_VECTOR_SESSIONS`	Recommended	LRU session cap
`FAISS_SESSION_MAX_AGE_DAYS`	Recommended	Startup stale cleanup
`RATE_LIMIT_UPLOAD_PER_MINUTE`	Recommended	Upload protection
`RATE_LIMIT_ASK_PER_MINUTE`	Recommended	Ask/stream protection
`SENTRY_DSN`	Optional	Backend error reporting
`SENTRY_ENVIRONMENT`	Optional	Sentry environment tag

Optional provider fallbacks

GROQ_API_KEY=
OPENAI_DIRECT_API_KEY=
GOOGLE_API_KEY=
HF_API_KEY=

Frontend (`frontend/.env`)

For local dev you can run with default assumptions, but recommended:

cd frontend
cp .env.example .env

Key variables:

Variable	Required	Purpose
`VITE_API_BASE_URL`	Yes in production	Backend public base URL
`VITE_DEV_PROXY_TARGET`	Optional	Local Vite proxy target
`VITE_FAISS_SESSION_MAX_AGE_DAYS`	Optional	UI retention text parity
`VITE_SENTRY_DSN`	Optional	Browser Sentry
`VITE_SENTRY_TRACES_RATE`	Optional	Perf tracing rate
`VITE_APP_ENV`	Optional	Env label (production/dev)

How to run locally

1) Start backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# set OPENROUTER_API_KEY in .env
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

Backend docs: http://127.0.0.1:8000/docs

2) Start frontend

cd frontend
npm install
npm run dev

Frontend app: http://localhost:5173

3) Learning walkthrough flow

Open chat page.
Upload sample PDF.
Ask summary question.
Toggle Sources and Stream.
Change model and compare behavior.
Inspect Network tab for /upload, /ask, /ask/stream.
Inspect backend logs to see retrieval/generation lifecycle.

How to deploy (Vercel + Coolify VPS)

Backend (Coolify)

Use backend/Dockerfile
Base Directory: /backend
Dockerfile path: /Dockerfile
Port expose: 3000
Set PORT=3000
Set CORS_ORIGINS to your frontend domain(s)
Configure domains and Traefik labels for:
- sslip fallback host
- production subdomain

Frontend (Vercel)

Root Directory: frontend
Framework: Vite
Build command: npm run build
Output directory: dist
Install command: npm install --legacy-peer-deps
Set VITE_API_BASE_URL=https://your-backend-domain

How to reuse this project in your own apps

Reuse frontend UI pieces

Copy frontend/src/components/ui for reusable styled primitives.
Copy ChatInput, ChatMessage, PDFUpload for chat/document UX.
Keep shared utility cn from frontend/src/lib/utils.ts.

Reuse backend architecture

Start from backend/app/routes route separation.
Reuse config.py settings pattern for env-driven deployments.
Reuse rate-limit service for any expensive endpoint.
Reuse session header approach for anonymous multi-user resource isolation.

Reuse API client pattern

frontend/src/lib/api.ts centralizes request and header handling.
Adapt endpoint map and payload types for your own backend quickly.

Quality checks and scripts

Root scripts

npm run lint
npm run check
npm run build
npm run build:all

Frontend scripts

cd frontend
npm run lint
npm run typecheck
npm run build
npm audit

Backend checks

cd backend
pip install -r requirements.txt -r requirements-dev.txt
ruff check app
mypy app
python -m unittest discover -s tests -p "test_*.py"

Current backend integration test:

backend/tests/test_chat_stream_sse.py
Validates /ask/stream SSE behavior (token + done) and source metadata flow.

Troubleshooting notes

CORS blocked in browser -> ensure deployed frontend origin is present in CORS_ORIGINS, then redeploy backend.
Vercel npm peer conflict -> use install command with --legacy-peer-deps.
No model response -> verify at least one provider key is valid.
Wrong/empty retrieval -> re-upload PDF and check session header consistency.
Frequent 404 probes in logs -> expected on public servers due to internet scanners.

Contributing

Fork the repository.
Create a feature branch.
Keep changes focused and run checks before PR.
Open a PR with short summary, scope, and risk notes.

License

This project is licensed under the MIT License. Feel free to use, modify, and distribute the code as per the terms of the license.

Happy Coding! 🎉

This is an open-source project - feel free to use, enhance, and extend this project further!

If you have any questions or want to share your work, reach out via GitHub or my portfolio at https://www.arnobmahmud.com.

Enjoy building and learning! 🚀

Thank you! 😊

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.vscode		.vscode
backend		backend
docs		docs
frontend		frontend
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pyrightconfig.json		pyrightconfig.json
rag-pdf-chat.code-workspace		rag-pdf-chat.code-workspace

Folders and files

Latest commit

History

Repository files navigation

RAG PDF Chat - Python, React, Tailwind CSS, FastAPI, SSE Streaming, Multi-Agent Pipeline, Text Chunking, Conversion History, Device-Local Data, Anonymous Sessions FullStack Project (Contextual Document Assistant)

Table of contents

Project overview

What you will learn

Keywords and glossary (beginner-friendly)

Architecture walkthrough

Tech stack and dependencies

Frontend

Backend

Why this stack is useful for learning

Project structure and file walkthrough

Core features and how they work

1) PDF upload and indexing

2) Chat with streaming or non-streaming

3) Source snippets toggle

4) Multi-model and fallback behavior

5) Session isolation and local history

6) Rate limits and cleanup

API reference

Example request

Environment variables (.env) explained

Backend (backend/.env)

Minimum required

Commonly used variables

Optional provider fallbacks

Frontend (frontend/.env)

How to run locally

1) Start backend

2) Start frontend

3) Learning walkthrough flow

How to deploy (Vercel + Coolify VPS)

Backend (Coolify)

Frontend (Vercel)

How to reuse this project in your own apps

Reuse frontend UI pieces

Reuse backend architecture

Reuse API client pattern

Quality checks and scripts

Root scripts

Frontend scripts

Backend checks

Troubleshooting notes

Contributing

License

Happy Coding! 🎉

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Environment variables (`.env`) explained

Backend (`backend/.env`)

Frontend (`frontend/.env`)

Packages