Long-Context RAG for Skedulelt, tuned to run on Render's free tier with Gemini 1.5 Flash.
Zero vector database. Zero infrastructure beyond one Render web service.
Two things about Render change the equation vs. an always-on server. Both are solved below.
Free web services spin down after 15 minutes without inbound traffic, and any changes to the local filesystem are lost. Every Map, every cached session, every variable — gone.
Fix applied: The server is now stateless. Conversation history lives in the browser (let history = []). Every POST /api/chat sends the full history alongside the new message; the server rebuilds the Gemini prompt from scratch each time. Token cost of replaying a 4-turn history is ~1 300 input tokens — negligible.
The Gemini 1.5 Flash free tier allows 1,500 requests per day but only 15 per minute. A burst of simultaneous users can hit that wall instantly.
Fix applied: All outgoing Gemini calls pass through a FIFO request queue with exponential back-off (2 s → 4 s → 8 s → 16 s, up to 4 retries). Requests are never silently dropped; they wait their turn.
All figures assume 2 sessions / user / day, 4 turns / session.
| Metric | Value |
|---|---|
| Input tokens per session | ~4 520 |
| Output tokens per session | ~390 |
| Cost per session | $0.000 456 |
| Monthly cost @ 100 users | $2.74 |
| Monthly cost @ 500 users | $13.68 |
| Gemini requests / day @ 100 users | ~800 ✅ (free limit = 1 500) |
| Gemini requests / day @ 500 users | ~4 000 ❌ (needs paid tier) |
| Render free hours / month | 750 h (covers 24/7 for 1 instance) |
| Peak context @ 50-turn session | ~8 230 tokens / 1 000 000 |
Tokens do not "run out." Each browser session is independent. The 1 M-token context window resets per request; even a 50-turn chat only uses ~8 k tokens.
| Users | Gemini tier | Render tier | Est. monthly AI cost |
|---|---|---|---|
| ≤ 100 | Free | Free | $0 |
| 100–500 | Paid | Free | ~$3–14 |
| 500+ | Paid | Paid ($7/mo) | ~$14–28 |
Explicit caching requires a minimum of 4 096 tokens; our KB is only ~980 tokens so it does not qualify. However, implicit caching is enabled by default on newer Gemini models — when a request shares a common prefix with a previous one, Google automatically applies the discount. Switching to Gemini 2.5 Flash in the future activates this with zero code changes.
skedulelt-rag-chatbot/
│
├── render.yaml ← Render Blueprint (IaC)
│
├── public/
│ └── index.html ← SPA; owns conversation history
│
├── src/
│ ├── server.js ← Stateless Express server
│ ├── geminiClient.js ← Gemini wrapper + FIFO queue + back-off
│ ├── knowledgeBase.js ← KB content (unchanged from v1)
│ └── test.js ← Unit + stateless-integration tests
│
├── .env.example
├── .gitignore
├── package.json
└── README.md
Browser ── POST { message, history } ──▶ server.js
│
▼
geminiClient.js
1. rebuild prompt (KB + history)
2. enqueue
3. Gemini API (retry on 429)
│
Browser ◀── { answer, matchedSections } ────┘
(appends answer to local history[])
Every request is self-contained. Nothing is stored server-side.
- Push this repo to GitHub.
- render.com → New → Blueprint → point at repo root.
- Enter your
GEMINI_API_KEYwhen prompted. - Click Create. Done.
cp .env.example .env
npm install
npm start # http://localhost:3000
npm run test # unit tests (+ live calls if key is set)| Area | Before | Render version |
|---|---|---|
| Session storage | Server Map |
Browser history[] |
| POST body | { message } + session header |
{ message, history } |
| DELETE endpoint | Cleared session | Removed |
| Rate limiting | None | FIFO queue + back-off |
| Deploy | Manual | render.yaml Blueprint |
knowledgeBase.js |
— | Unchanged |
| Frontend CSS | — | Unchanged |
| Signal | Action |
|---|---|
| > 500 DAU | Switch to Gemini paid tier |
| Render feels sluggish | Upgrade to Starter ($7/mo) — no spin-down |
| KB > 500 K tokens | Add vector DB layer |
| Chat history > 50 turns regularly | Summarise older turns before sending |