Skedulelt RAG Chatbot — Render Edition

Long-Context RAG for Skedulelt, tuned to run on Render's free tier with Gemini 1.5 Flash.
Zero vector database. Zero infrastructure beyond one Render web service.

✅ Is Long Context still feasible on Render? Yes.

Two things about Render change the equation vs. an always-on server. Both are solved below.

Problem 1 — Render spins the instance down after 15 min idle

Free web services spin down after 15 minutes without inbound traffic, and any changes to the local filesystem are lost. Every Map, every cached session, every variable — gone.

Fix applied: The server is now stateless. Conversation history lives in the browser (let history = []). Every POST /api/chat sends the full history alongside the new message; the server rebuilds the Gemini prompt from scratch each time. Token cost of replaying a 4-turn history is ~1 300 input tokens — negligible.

Problem 2 — Gemini free tier caps at 15 requests per minute

The Gemini 1.5 Flash free tier allows 1,500 requests per day but only 15 per minute. A burst of simultaneous users can hit that wall instantly.

Fix applied: All outgoing Gemini calls pass through a FIFO request queue with exponential back-off (2 s → 4 s → 8 s → 16 s, up to 4 retries). Requests are never silently dropped; they wait their turn.

📊 Numbers — 100 users, 30-day month

All figures assume 2 sessions / user / day, 4 turns / session.

Metric	Value
Input tokens per session	~4 520
Output tokens per session	~390
Cost per session	$0.000 456
Monthly cost @ 100 users	$2.74
Monthly cost @ 500 users	$13.68
Gemini requests / day @ 100 users	~800 ✅ (free limit = 1 500)
Gemini requests / day @ 500 users	~4 000 ❌ (needs paid tier)
Render free hours / month	750 h (covers 24/7 for 1 instance)
Peak context @ 50-turn session	~8 230 tokens / 1 000 000

Tokens do not "run out." Each browser session is independent. The 1 M-token context window resets per request; even a 50-turn chat only uses ~8 k tokens.

Scaling thresholds

Users	Gemini tier	Render tier	Est. monthly AI cost
≤ 100	Free	Free	$0
100–500	Paid	Free	~$3–14
500+	Paid	Paid ($7/mo)	~$14–28

Context Caching note

Explicit caching requires a minimum of 4 096 tokens; our KB is only ~980 tokens so it does not qualify. However, implicit caching is enabled by default on newer Gemini models — when a request shares a common prefix with a previous one, Google automatically applies the discount. Switching to Gemini 2.5 Flash in the future activates this with zero code changes.

📁 File Tree & GitHub Paths

skedulelt-rag-chatbot/
│
├── render.yaml             ← Render Blueprint (IaC)
│
├── public/
│   └── index.html          ← SPA; owns conversation history
│
├── src/
│   ├── server.js           ← Stateless Express server
│   ├── geminiClient.js     ← Gemini wrapper + FIFO queue + back-off
│   ├── knowledgeBase.js    ← KB content (unchanged from v1)
│   └── test.js             ← Unit + stateless-integration tests
│
├── .env.example
├── .gitignore
├── package.json
└── README.md

🏗️ Data Flow

Browser  ──  POST { message, history }  ──▶  server.js
                                               │
                                               ▼
                                         geminiClient.js
                                           1. rebuild prompt (KB + history)
                                           2. enqueue
                                           3. Gemini API  (retry on 429)
                                               │
Browser  ◀──  { answer, matchedSections } ────┘
  (appends answer to local history[])

Every request is self-contained. Nothing is stored server-side.

🚀 Deploy to Render (30 seconds)

Push this repo to GitHub.
render.com → New → Blueprint → point at repo root.
Enter your GEMINI_API_KEY when prompted.
Click Create. Done.

Local dev

cp .env.example .env
npm install
npm start          # http://localhost:3000
npm run test       # unit tests (+ live calls if key is set)

🔧 What changed vs. the original

Area	Before	Render version
Session storage	Server `Map`	Browser `history[]`
POST body	`{ message }` + session header	`{ message, history }`
DELETE endpoint	Cleared session	Removed
Rate limiting	None	FIFO queue + back-off
Deploy	Manual	`render.yaml` Blueprint
`knowledgeBase.js`	—	Unchanged
Frontend CSS	—	Unchanged

🔄 Upgrade signals

Signal	Action
> 500 DAU	Switch to Gemini paid tier
Render feels sluggish	Upgrade to Starter ($7/mo) — no spin-down
KB > 500 K tokens	Add vector DB layer
Chat history > 50 turns regularly	Summarise older turns before sending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skedulelt RAG Chatbot — Render Edition

✅ Is Long Context still feasible on Render? Yes.

Problem 1 — Render spins the instance down after 15 min idle

Problem 2 — Gemini free tier caps at 15 requests per minute

📊 Numbers — 100 users, 30-day month

Scaling thresholds

Context Caching note

📁 File Tree & GitHub Paths

🏗️ Data Flow

🚀 Deploy to Render (30 seconds)

Local dev

🔧 What changed vs. the original

🔄 Upgrade signals

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
env.example		env.example
geminiClient.js		geminiClient.js
index.html		index.html
knowledgeBase.js		knowledgeBase.js
package.json		package.json
render.yaml		render.yaml
server.js		server.js
test.js		test.js

Folders and files

Latest commit

History

Repository files navigation

Skedulelt RAG Chatbot — Render Edition

✅ Is Long Context still feasible on Render? Yes.

Problem 1 — Render spins the instance down after 15 min idle

Problem 2 — Gemini free tier caps at 15 requests per minute

📊 Numbers — 100 users, 30-day month

Scaling thresholds

Context Caching note

📁 File Tree & GitHub Paths

🏗️ Data Flow

🚀 Deploy to Render (30 seconds)

Local dev

🔧 What changed vs. the original

🔄 Upgrade signals

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages