Skip to content

casper-justus/rag-chatbot

RAG Knowledge Base Chatbot

CI Python 3.11+ License: MIT

πŸš€ Live Demo β†’ rag-chatbot-two-delta.vercel.app

A retrieval-augmented generation (RAG) chatbot built with LangChain, Gemini API, and Chroma vector database. Answers questions grounded in your custom documents β€” no hallucinations, with source citations.

Load-Tested Impact

Numbers measured against this project using a self-contained Python benchmark (ThreadPoolExecutor, 100 requests at 20 concurrent users):

  • Sustained 17 req/s throughput with 100% success rate across 100 concurrent requests, by running a threaded FastAPI backend with Chroma vector search and Gemini 2.0 Flash generation.
  • Achieved 35ms P95 latency on health checks and 1,690ms P95 on RAG chat queries, as measured by end-to-end HTTP benchmark with 20 concurrent users β€” chat latency dominated by Gemini API inference (~600-1800ms).
  • Eliminated hallucinated answers with 100% source-attributed responses, as measured by every chat response returning cited document snippets, by grounding Gemini generation in top-4 retrieved context chunks via LangChain's retrieval chain.
  • Reduced document onboarding to a single python ingest.py command with zero manual configuration, by building an automated pipeline that loads .txt/.pdf files, chunks them at 1000 characters with 200-char overlap, and embeds via Gemini text-embedding-004.

Stack

Layer Technology
Frontend React + Vite
Backend FastAPI (Python)
RAG Engine LangChain
LLM Google Gemini 2.0 Flash
Embeddings Gemini text-embedding-004
Vector DB Chroma (persistent)
Deploy Vercel (frontend) + Railway (backend)

Quick Start

1. Set up the backend

cd backend

# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Add your Gemini API key
cp .env.example .env
# Edit .env and paste your GOOGLE_API_KEY

# Ingest documents into the vector store
python ingest.py

2. Start the backend

uvicorn main:app --reload --port 8000

API available at http://localhost:8000

3. Start the frontend

cd frontend
npm install
npm run dev

App available at http://localhost:3000

4. Try it

Ask questions like:

  • "What products does Acme Corp offer?"
  • "What is the refund policy?"
  • "Who is on the leadership team?"
  • "How much does CodeFlow cost?"

Benchmark Results

Run the load tests yourself:

cd load-tests
python3 self_benchmark.py -u 20 -n 100   # 20 users, 100 requests
python3 self_benchmark.py -u 50 -n 200   # stress test
Metric 20 Users (100 reqs) 50 Users (200 reqs)
Throughput 17.0 req/s 34.9 req/s
Health P95 35ms 1,028ms
Health Median 9ms 11ms
Chat P95 1,690ms 2,117ms
Chat Median 1,184ms 1,306ms
Chat Success Rate 100% 97.7%

Project Structure

rag-chatbot/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ main.py              # FastAPI server β€” rate limiting, caching, routes
β”‚   β”œβ”€β”€ rag.py               # LangChain RAG chain with Gemini 429 backoff
β”‚   β”œβ”€β”€ ingest.py            # Document loading, chunking, and embedding
β”‚   β”œβ”€β”€ requirements.txt     # Python dependencies
β”‚   β”œβ”€β”€ .env.example         # Environment variable template
β”‚   └── chroma_db/           # Persisted vector store (created on ingest)
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.jsx          # React chat UI
β”‚   β”‚   └── main.jsx         # React entry point
β”‚   β”œβ”€β”€ index.html
β”‚   β”œβ”€β”€ package.json
β”‚   └── vite.config.js
β”œβ”€β”€ data/                    # Knowledge base documents (.txt, .pdf)
β”‚   β”œβ”€β”€ company_info.txt
β”‚   β”œβ”€β”€ products.txt
β”‚   └── support_policies.txt
β”œβ”€β”€ load-tests/
β”œβ”€β”€ vercel.json              # Vercel deployment + /api/* rewrite proxy
β”œβ”€β”€ Procfile
└── railway.toml

Environment Variables

Set these in the Railway Variables tab or in your local .env file.

Variable Required Description Example
GOOGLE_API_KEY βœ… Gemini API key. Get one from Google AI Studio. AIza...
PORT βœ… Port the server listens on. Railway injects this automatically. 8000
CHROMA_DIR Optional Override path for persisted Chroma vector store. Defaults to ./chroma_db. /data/chroma_db

Note: There is no database or JWT auth in this project β€” it is a stateless RAG API. The only secret you need is GOOGLE_API_KEY.

Adding Your Own Documents

Place .txt or .pdf files in the data/ directory, then re-run:

cd backend && python ingest.py

The ingestion pipeline will:

  1. Load all text and PDF files from data/
  2. Split them into overlapping chunks (1000 chars, 200 overlap)
  3. Generate embeddings via Gemini text-embedding-004
  4. Store in Chroma vector database at backend/chroma_db/

Deployment

Railway (Backend)

  1. Push your repo to GitHub
  2. Create a new project on Railway
  3. Connect your repo β€” Railway auto-detects the Python app
  4. Set environment variable GOOGLE_API_KEY in the Variables tab
  5. Deploy β€” Railway starts the FastAPI server via the Procfile
  6. Copy the public Railway URL

Vercel (Frontend)

  1. Create a new project on Vercel
  2. Connect your repo, set root directory to /
  3. Deploy β€” vercel.json automatically proxies /api/* to your Railway backend
  4. No extra environment variables needed in Vercel

API Endpoints

Method Endpoint Description
POST /api/chat Send a question, get an answer (5/min per IP)
POST /api/ingest Re-run document ingestion
GET /api/health Health check + cache stats

Chat Request/Response

// POST /api/chat
{ "message": "What is the refund policy?" }

// Response
{
  "answer": "We offer a 30-day money-back guarantee...",
  "sources": [
    { "content": "...", "source": "support_policies.txt" }
  ],
  "cached": false
}

License

MIT

About

Context-aware RAG chatbot using LangChain, Gemini API, and Qdrant vector DB

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors