π Live Demo β rag-chatbot-two-delta.vercel.app
A retrieval-augmented generation (RAG) chatbot built with LangChain, Gemini API, and Chroma vector database. Answers questions grounded in your custom documents β no hallucinations, with source citations.
Numbers measured against this project using a self-contained Python benchmark (ThreadPoolExecutor, 100 requests at 20 concurrent users):
- Sustained 17 req/s throughput with 100% success rate across 100 concurrent requests, by running a threaded FastAPI backend with Chroma vector search and Gemini 2.0 Flash generation.
- Achieved 35ms P95 latency on health checks and 1,690ms P95 on RAG chat queries, as measured by end-to-end HTTP benchmark with 20 concurrent users β chat latency dominated by Gemini API inference (~600-1800ms).
- Eliminated hallucinated answers with 100% source-attributed responses, as measured by every chat response returning cited document snippets, by grounding Gemini generation in top-4 retrieved context chunks via LangChain's retrieval chain.
- Reduced document onboarding to a single
python ingest.pycommand with zero manual configuration, by building an automated pipeline that loads.txt/.pdffiles, chunks them at 1000 characters with 200-char overlap, and embeds via Gemini text-embedding-004.
| Layer | Technology |
|---|---|
| Frontend | React + Vite |
| Backend | FastAPI (Python) |
| RAG Engine | LangChain |
| LLM | Google Gemini 2.0 Flash |
| Embeddings | Gemini text-embedding-004 |
| Vector DB | Chroma (persistent) |
| Deploy | Vercel (frontend) + Railway (backend) |
cd backend
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Add your Gemini API key
cp .env.example .env
# Edit .env and paste your GOOGLE_API_KEY
# Ingest documents into the vector store
python ingest.pyuvicorn main:app --reload --port 8000API available at http://localhost:8000
cd frontend
npm install
npm run devApp available at http://localhost:3000
Ask questions like:
- "What products does Acme Corp offer?"
- "What is the refund policy?"
- "Who is on the leadership team?"
- "How much does CodeFlow cost?"
Run the load tests yourself:
cd load-tests
python3 self_benchmark.py -u 20 -n 100 # 20 users, 100 requests
python3 self_benchmark.py -u 50 -n 200 # stress test| Metric | 20 Users (100 reqs) | 50 Users (200 reqs) |
|---|---|---|
| Throughput | 17.0 req/s | 34.9 req/s |
| Health P95 | 35ms | 1,028ms |
| Health Median | 9ms | 11ms |
| Chat P95 | 1,690ms | 2,117ms |
| Chat Median | 1,184ms | 1,306ms |
| Chat Success Rate | 100% | 97.7% |
rag-chatbot/
βββ backend/
β βββ main.py # FastAPI server β rate limiting, caching, routes
β βββ rag.py # LangChain RAG chain with Gemini 429 backoff
β βββ ingest.py # Document loading, chunking, and embedding
β βββ requirements.txt # Python dependencies
β βββ .env.example # Environment variable template
β βββ chroma_db/ # Persisted vector store (created on ingest)
βββ frontend/
β βββ src/
β β βββ App.jsx # React chat UI
β β βββ main.jsx # React entry point
β βββ index.html
β βββ package.json
β βββ vite.config.js
βββ data/ # Knowledge base documents (.txt, .pdf)
β βββ company_info.txt
β βββ products.txt
β βββ support_policies.txt
βββ load-tests/
βββ vercel.json # Vercel deployment + /api/* rewrite proxy
βββ Procfile
βββ railway.toml
Set these in the Railway Variables tab or in your local .env file.
| Variable | Required | Description | Example |
|---|---|---|---|
GOOGLE_API_KEY |
β | Gemini API key. Get one from Google AI Studio. | AIza... |
PORT |
β | Port the server listens on. Railway injects this automatically. | 8000 |
CHROMA_DIR |
Optional | Override path for persisted Chroma vector store. Defaults to ./chroma_db. |
/data/chroma_db |
Note: There is no database or JWT auth in this project β it is a stateless RAG API. The only secret you need is
GOOGLE_API_KEY.
Place .txt or .pdf files in the data/ directory, then re-run:
cd backend && python ingest.pyThe ingestion pipeline will:
- Load all text and PDF files from
data/ - Split them into overlapping chunks (1000 chars, 200 overlap)
- Generate embeddings via Gemini text-embedding-004
- Store in Chroma vector database at
backend/chroma_db/
- Push your repo to GitHub
- Create a new project on Railway
- Connect your repo β Railway auto-detects the Python app
- Set environment variable
GOOGLE_API_KEYin the Variables tab - Deploy β Railway starts the FastAPI server via the
Procfile - Copy the public Railway URL
- Create a new project on Vercel
- Connect your repo, set root directory to
/ - Deploy β
vercel.jsonautomatically proxies/api/*to your Railway backend - No extra environment variables needed in Vercel
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/chat |
Send a question, get an answer (5/min per IP) |
| POST | /api/ingest |
Re-run document ingestion |
| GET | /api/health |
Health check + cache stats |
// POST /api/chat
{ "message": "What is the refund policy?" }
// Response
{
"answer": "We offer a 30-day money-back guarantee...",
"sources": [
{ "content": "...", "source": "support_policies.txt" }
],
"cached": false
}MIT