RAG Knowledge Base Chatbot

🚀 Live Demo → rag-chatbot-two-delta.vercel.app

A retrieval-augmented generation (RAG) chatbot built with LangChain, Gemini API, and Chroma vector database. Answers questions grounded in your custom documents — no hallucinations, with source citations.

Load-Tested Impact

Numbers measured against this project using a self-contained Python benchmark (ThreadPoolExecutor, 100 requests at 20 concurrent users):

Sustained 17 req/s throughput with 100% success rate across 100 concurrent requests, by running a threaded FastAPI backend with Chroma vector search and Gemini 2.0 Flash generation.
Achieved 35ms P95 latency on health checks and 1,690ms P95 on RAG chat queries, as measured by end-to-end HTTP benchmark with 20 concurrent users — chat latency dominated by Gemini API inference (~600-1800ms).
Eliminated hallucinated answers with 100% source-attributed responses, as measured by every chat response returning cited document snippets, by grounding Gemini generation in top-4 retrieved context chunks via LangChain's retrieval chain.
Reduced document onboarding to a single python ingest.py command with zero manual configuration, by building an automated pipeline that loads .txt/.pdf files, chunks them at 1000 characters with 200-char overlap, and embeds via Gemini text-embedding-004.

Stack

Layer	Technology
Frontend	React + Vite
Backend	FastAPI (Python)
RAG Engine	LangChain
LLM	Google Gemini 2.0 Flash
Embeddings	Gemini text-embedding-004
Vector DB	Chroma (persistent)
Deploy	Vercel (frontend) + Railway (backend)

Quick Start

1. Set up the backend

cd backend

# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Add your Gemini API key
cp .env.example .env
# Edit .env and paste your GOOGLE_API_KEY

# Ingest documents into the vector store
python ingest.py

2. Start the backend

uvicorn main:app --reload --port 8000

API available at http://localhost:8000

3. Start the frontend

cd frontend
npm install
npm run dev

App available at http://localhost:3000

4. Try it

Ask questions like:

"What products does Acme Corp offer?"
"What is the refund policy?"
"Who is on the leadership team?"
"How much does CodeFlow cost?"

Benchmark Results

Run the load tests yourself:

cd load-tests
python3 self_benchmark.py -u 20 -n 100   # 20 users, 100 requests
python3 self_benchmark.py -u 50 -n 200   # stress test

Metric	20 Users (100 reqs)	50 Users (200 reqs)
Throughput	17.0 req/s	34.9 req/s
Health P95	35ms	1,028ms
Health Median	9ms	11ms
Chat P95	1,690ms	2,117ms
Chat Median	1,184ms	1,306ms
Chat Success Rate	100%	97.7%

Project Structure

rag-chatbot/
├── backend/
│   ├── main.py              # FastAPI server — rate limiting, caching, routes
│   ├── rag.py               # LangChain RAG chain with Gemini 429 backoff
│   ├── ingest.py            # Document loading, chunking, and embedding
│   ├── requirements.txt     # Python dependencies
│   ├── .env.example         # Environment variable template
│   └── chroma_db/           # Persisted vector store (created on ingest)
├── frontend/
│   ├── src/
│   │   ├── App.jsx          # React chat UI
│   │   └── main.jsx         # React entry point
│   ├── index.html
│   ├── package.json
│   └── vite.config.js
├── data/                    # Knowledge base documents (.txt, .pdf)
│   ├── company_info.txt
│   ├── products.txt
│   └── support_policies.txt
├── load-tests/
├── vercel.json              # Vercel deployment + /api/* rewrite proxy
├── Procfile
└── railway.toml

Environment Variables

Set these in the Railway Variables tab or in your local .env file.

Variable	Required	Description	Example
`GOOGLE_API_KEY`	✅	Gemini API key. Get one from Google AI Studio.	`AIza...`
`PORT`	✅	Port the server listens on. Railway injects this automatically.	`8000`
`CHROMA_DIR`	Optional	Override path for persisted Chroma vector store. Defaults to `./chroma_db`.	`/data/chroma_db`

Note: There is no database or JWT auth in this project — it is a stateless RAG API. The only secret you need is GOOGLE_API_KEY.

Adding Your Own Documents

Place .txt or .pdf files in the data/ directory, then re-run:

cd backend && python ingest.py

The ingestion pipeline will:

Load all text and PDF files from data/
Split them into overlapping chunks (1000 chars, 200 overlap)
Generate embeddings via Gemini text-embedding-004
Store in Chroma vector database at backend/chroma_db/

Deployment

Railway (Backend)

Push your repo to GitHub
Create a new project on Railway
Connect your repo — Railway auto-detects the Python app
Set environment variable GOOGLE_API_KEY in the Variables tab
Deploy — Railway starts the FastAPI server via the Procfile
Copy the public Railway URL

Vercel (Frontend)

Create a new project on Vercel
Connect your repo, set root directory to /
Deploy — vercel.json automatically proxies /api/* to your Railway backend
No extra environment variables needed in Vercel

API Endpoints

Method	Endpoint	Description
POST	`/api/chat`	Send a question, get an answer (5/min per IP)
POST	`/api/ingest`	Re-run document ingestion
GET	`/api/health`	Health check + cache stats

Chat Request/Response

// POST /api/chat
{ "message": "What is the refund policy?" }

// Response
{
  "answer": "We offer a 30-day money-back guarantee...",
  "sources": [
    { "content": "...", "source": "support_policies.txt" }
  ],
  "cached": false
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github		.github
backend		backend
data		data
frontend		frontend
load-tests		load-tests
.dockerignore		.dockerignore
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
SECURITY.md		SECURITY.md
nixpacks.toml		nixpacks.toml
railway.toml		railway.toml
start.sh		start.sh
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Knowledge Base Chatbot

Load-Tested Impact

Stack

Quick Start

1. Set up the backend

2. Start the backend

3. Start the frontend

4. Try it

Benchmark Results

Project Structure

Environment Variables

Adding Your Own Documents

Deployment

Railway (Backend)

Vercel (Frontend)

API Endpoints

Chat Request/Response

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Knowledge Base Chatbot

Load-Tested Impact

Stack

Quick Start

1. Set up the backend

2. Start the backend

3. Start the frontend

4. Try it

Benchmark Results

Project Structure

Environment Variables

Adding Your Own Documents

Deployment

Railway (Backend)

Vercel (Frontend)

API Endpoints

Chat Request/Response

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages