RAG PDF Chat - Python, React, Tailwind CSS, FastAPI, SSE Streaming, Multi-Agent Pipeline, Text Chunking, Conversion History, Device-Local Data, Anonymous Sessions FullStack Project (Contextual Document Assistant)
A production-style, educational full-stack RAG project that demonstrates how to turn PDF documents into searchable knowledge and chat with them using modern AI models. It is designed for learners and builders who want to understand document chunking, embeddings, vector search, SSE streaming responses, multi-provider model fallback, and practical deployment (Vercel + Coolify VPS) end to end.
- Frontend Live Demo: https://pdf-chat-scrapper.vercel.app/
- Backend Live Demo: https://rag-pdf-backend.arnobmahmud.com/
- Project overview
- What you will learn
- Keywords and glossary (beginner-friendly)
- Architecture walkthrough
- Tech stack and dependencies
- Project structure and file walkthrough
- Core features and how they work
- API reference
- Environment variables (
.env) explained - How to run locally
- How to deploy (Vercel + Coolify VPS)
- How to reuse this project in your own apps
- Quality checks and scripts
- Troubleshooting notes
- Contributing
- License
This app lets a user upload a PDF and ask questions about it. The backend parses PDF text, splits it into chunks, embeds each chunk into vectors, stores vectors in FAISS, retrieves relevant context for each question, then sends that context to an LLM for grounded responses.
It also includes:
- Anonymous session isolation (per browser via session header)
- Streaming answers (SSE) and non-streaming mode
- Model selector with provider fallback
- Optional source snippets
- Rate limiting
- Device-local saved chat history in IndexedDB
- Deployment-ready Docker/Coolify setup
- How RAG (Retrieval Augmented Generation) works in a practical, production-like app.
- How to build a TypeScript React frontend that calls a FastAPI backend.
- How to wire PDF upload, chunking, embeddings, and vector search.
- How to stream model output token-by-token over SSE.
- How to maintain per-browser isolation without user authentication.
- How to deploy frontend and backend separately with correct CORS and environment config.
| Term | Meaning |
|---|---|
| RAG | Retrieve relevant document context first, then generate answer with LLM. |
| Embedding | Numeric vector representation of text meaning. |
| FAISS | Fast vector database/index for similarity search. |
| Chunking | Splitting long PDF text into smaller pieces for retrieval. |
| SSE | Server-Sent Events for live streaming answer text. |
| Session ID | Unique browser identifier used to isolate each userβs PDF vector index. |
| LRU eviction | Removes least-recently-used session indexes when cap is reached. |
| CORS | Browser security rule controlling which frontend origins can call backend APIs. |
React SPA (frontend)
ββ localStorage: anonymous session UUID (X-Chat-Session-Id)
ββ IndexedDB: saved chat history by PDF
ββ Calls FastAPI endpoints (/upload, /ask, /ask/stream, /status, /models)
FastAPI backend
ββ PDF loader + text splitter
ββ Embedding service + FAISS vector store
ββ Agent pipeline (retrieve -> optimize -> answer -> validate)
ββ Optional source snippets
ββ Rate limiting and session cleanup
ββ Optional Sentry tunnel (/api/oversight)
- React 18 + TypeScript
- Vite
- Tailwind CSS
- Framer Motion
- React Router
- Radix UI primitives
- Sonner toast notifications
- Sentry browser SDK (optional)
- FastAPI + Uvicorn
- Pydantic + pydantic-settings
- LangChain ecosystem
- FAISS CPU
- sentence-transformers (local embedding fallback)
- httpx / aiohttp
- Tenacity retries
- It separates UI concerns from AI/backend concerns cleanly.
- It demonstrates real deployment constraints (CORS, env vars, reverse proxy).
- It includes robust failover behavior and operational safety defaults.
rag-pdf-chat/
βββ README.md
βββ docs/ # deployment and operational guides
βββ frontend/
β βββ index.html
β βββ package.json
β βββ vite.config.ts
β βββ src/
β β βββ main.tsx # app bootstrap
β β βββ App.tsx # routes and app-level providers
β β βββ pages/ # home, chat, about, api-status
β β βββ components/
β β β βββ chat/ # chat container, model selector, upload, input
β β β βββ layout/ # header/footer/layout helpers
β β β βββ sections/ # marketing/documentation sections
β β β βββ ui/ # reusable UI primitives
β β βββ hooks/ # data and behavior hooks
β β βββ lib/ # api/env/storage/session logic
β β βββ types/ # shared TS types
β βββ public/
βββ backend/
βββ app/
β βββ main.py # app setup and middleware
β βββ config.py # settings/env/provider config
β βββ routes/ # health, upload, chat, oversight
β βββ services/ # vector store, rate limiting, cleanup
β βββ agents/ # multi-step answer pipeline
βββ requirements.txt
βββ requirements-dev.txt
βββ .env.example
βββ Dockerfile
βββ .dockerignore
User uploads a PDF through the frontend. Backend:
- extracts text
- chunks it
- embeds each chunk
- stores vectors in FAISS under session-specific folder
- Streaming on -> uses SSE (
/ask/stream) for live token output. - Streaming off -> classic JSON response (
/ask).
- When enabled, backend returns source context snippets (if available).
- Helps explain where the answer came from.
- Frontend can select a preferred model.
- Backend tries configured providers and can fall back when a provider fails or is over quota.
- Browser keeps anonymous session UUID.
- Backend uses
X-Chat-Session-Idto separate vector indexes per browser. - Frontend stores transcript locally in IndexedDB per PDF.
- Per-IP request limits for upload and ask routes.
- Startup cleanup removes stale session FAISS folders.
Most data routes require
X-Chat-Session-Idheader.
| Method | Endpoint | Purpose |
|---|---|---|
GET |
/ |
Basic backend status |
GET |
/health |
Health check |
GET |
/models |
Available models/providers |
GET |
/pipeline-info |
Explains pipeline stages |
GET |
/status |
Session PDF loaded status |
POST |
/upload |
Upload PDF and build index |
POST |
/ask |
Ask question (non-streaming JSON) |
POST |
/ask/stream |
Ask question (SSE streaming) |
POST |
/api/oversight |
Sentry tunnel endpoint |
curl -X POST "http://127.0.0.1:8000/ask" \
-H "Content-Type: application/json" \
-H "X-Chat-Session-Id: 11111111-2222-4333-8444-555555555555" \
-d '{"question":"Summarize this PDF","model":"openai/gpt-4o-mini","include_sources":true}'This project does need backend environment variables for real AI usage.
Create from template:
cd backend
cp .env.example .envOPENROUTER_API_KEY=your_openrouter_key
OPENROUTER_API_BASE=https://openrouter.ai/api/v1| Variable | Required | Purpose |
|---|---|---|
OPENROUTER_API_KEY |
Yes | Main provider key |
OPENROUTER_API_BASE |
Yes | OpenRouter base URL |
DEFAULT_MODEL |
Recommended | Default model ID |
DEFAULT_PROVIDER |
Recommended | Provider selection hint |
CORS_ORIGINS |
Yes for deployment | Allowed frontend origins |
FAISS_PERSIST_DIR |
Recommended | Vector index directory |
MAX_VECTOR_SESSIONS |
Recommended | LRU session cap |
FAISS_SESSION_MAX_AGE_DAYS |
Recommended | Startup stale cleanup |
RATE_LIMIT_UPLOAD_PER_MINUTE |
Recommended | Upload protection |
RATE_LIMIT_ASK_PER_MINUTE |
Recommended | Ask/stream protection |
SENTRY_DSN |
Optional | Backend error reporting |
SENTRY_ENVIRONMENT |
Optional | Sentry environment tag |
GROQ_API_KEY=
OPENAI_DIRECT_API_KEY=
GOOGLE_API_KEY=
HF_API_KEY=For local dev you can run with default assumptions, but recommended:
cd frontend
cp .env.example .envKey variables:
| Variable | Required | Purpose |
|---|---|---|
VITE_API_BASE_URL |
Yes in production | Backend public base URL |
VITE_DEV_PROXY_TARGET |
Optional | Local Vite proxy target |
VITE_FAISS_SESSION_MAX_AGE_DAYS |
Optional | UI retention text parity |
VITE_SENTRY_DSN |
Optional | Browser Sentry |
VITE_SENTRY_TRACES_RATE |
Optional | Perf tracing rate |
VITE_APP_ENV |
Optional | Env label (production/dev) |
cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# set OPENROUTER_API_KEY in .env
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000Backend docs: http://127.0.0.1:8000/docs
cd frontend
npm install
npm run devFrontend app: http://localhost:5173
- Open chat page.
- Upload sample PDF.
- Ask summary question.
- Toggle Sources and Stream.
- Change model and compare behavior.
- Inspect Network tab for
/upload,/ask,/ask/stream. - Inspect backend logs to see retrieval/generation lifecycle.
- Use
backend/Dockerfile - Base Directory:
/backend - Dockerfile path:
/Dockerfile - Port expose:
3000 - Set
PORT=3000 - Set
CORS_ORIGINSto your frontend domain(s) - Configure domains and Traefik labels for:
- sslip fallback host
- production subdomain
- Root Directory:
frontend - Framework: Vite
- Build command:
npm run build - Output directory:
dist - Install command:
npm install --legacy-peer-deps - Set
VITE_API_BASE_URL=https://your-backend-domain
- Copy
frontend/src/components/uifor reusable styled primitives. - Copy
ChatInput,ChatMessage,PDFUploadfor chat/document UX. - Keep shared utility
cnfromfrontend/src/lib/utils.ts.
- Start from
backend/app/routesroute separation. - Reuse
config.pysettings pattern for env-driven deployments. - Reuse rate-limit service for any expensive endpoint.
- Reuse session header approach for anonymous multi-user resource isolation.
frontend/src/lib/api.tscentralizes request and header handling.- Adapt endpoint map and payload types for your own backend quickly.
npm run lint
npm run check
npm run build
npm run build:allcd frontend
npm run lint
npm run typecheck
npm run build
npm auditcd backend
pip install -r requirements.txt -r requirements-dev.txt
ruff check app
mypy app
python -m unittest discover -s tests -p "test_*.py"Current backend integration test:
backend/tests/test_chat_stream_sse.py
Validates/ask/streamSSE behavior (token+done) and source metadata flow.
- CORS blocked in browser -> ensure deployed frontend origin is present in
CORS_ORIGINS, then redeploy backend. - Vercel npm peer conflict -> use install command with
--legacy-peer-deps. - No model response -> verify at least one provider key is valid.
- Wrong/empty retrieval -> re-upload PDF and check session header consistency.
- Frequent 404 probes in logs -> expected on public servers due to internet scanners.
- Fork the repository.
- Create a feature branch.
- Keep changes focused and run checks before PR.
- Open a PR with short summary, scope, and risk notes.
This project is licensed under the MIT License. Feel free to use, modify, and distribute the code as per the terms of the license.
This is an open-source project - feel free to use, enhance, and extend this project further!
If you have any questions or want to share your work, reach out via GitHub or my portfolio at https://www.arnobmahmud.com.
Enjoy building and learning! π
Thank you! π









