Skip to content

RiggedToEncodeINFO3604Project/AI-JS-Test

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Skedulelt RAG Chatbot — Render Edition

Long-Context RAG for Skedulelt, tuned to run on Render's free tier with Gemini 1.5 Flash.
Zero vector database. Zero infrastructure beyond one Render web service.


✅ Is Long Context still feasible on Render? Yes.

Two things about Render change the equation vs. an always-on server. Both are solved below.

Problem 1 — Render spins the instance down after 15 min idle

Free web services spin down after 15 minutes without inbound traffic, and any changes to the local filesystem are lost. Every Map, every cached session, every variable — gone.

Fix applied: The server is now stateless. Conversation history lives in the browser (let history = []). Every POST /api/chat sends the full history alongside the new message; the server rebuilds the Gemini prompt from scratch each time. Token cost of replaying a 4-turn history is ~1 300 input tokens — negligible.

Problem 2 — Gemini free tier caps at 15 requests per minute

The Gemini 1.5 Flash free tier allows 1,500 requests per day but only 15 per minute. A burst of simultaneous users can hit that wall instantly.

Fix applied: All outgoing Gemini calls pass through a FIFO request queue with exponential back-off (2 s → 4 s → 8 s → 16 s, up to 4 retries). Requests are never silently dropped; they wait their turn.


📊 Numbers — 100 users, 30-day month

All figures assume 2 sessions / user / day, 4 turns / session.

Metric Value
Input tokens per session ~4 520
Output tokens per session ~390
Cost per session $0.000 456
Monthly cost @ 100 users $2.74
Monthly cost @ 500 users $13.68
Gemini requests / day @ 100 users ~800 ✅ (free limit = 1 500)
Gemini requests / day @ 500 users ~4 000 ❌ (needs paid tier)
Render free hours / month 750 h (covers 24/7 for 1 instance)
Peak context @ 50-turn session ~8 230 tokens / 1 000 000

Tokens do not "run out." Each browser session is independent. The 1 M-token context window resets per request; even a 50-turn chat only uses ~8 k tokens.

Scaling thresholds

Users Gemini tier Render tier Est. monthly AI cost
≤ 100 Free Free $0
100–500 Paid Free ~$3–14
500+ Paid Paid ($7/mo) ~$14–28

Context Caching note

Explicit caching requires a minimum of 4 096 tokens; our KB is only ~980 tokens so it does not qualify. However, implicit caching is enabled by default on newer Gemini models — when a request shares a common prefix with a previous one, Google automatically applies the discount. Switching to Gemini 2.5 Flash in the future activates this with zero code changes.


📁 File Tree & GitHub Paths

skedulelt-rag-chatbot/
│
├── render.yaml             ← Render Blueprint (IaC)
│
├── public/
│   └── index.html          ← SPA; owns conversation history
│
├── src/
│   ├── server.js           ← Stateless Express server
│   ├── geminiClient.js     ← Gemini wrapper + FIFO queue + back-off
│   ├── knowledgeBase.js    ← KB content (unchanged from v1)
│   └── test.js             ← Unit + stateless-integration tests
│
├── .env.example
├── .gitignore
├── package.json
└── README.md

🏗️ Data Flow

Browser  ──  POST { message, history }  ──▶  server.js
                                               │
                                               ▼
                                         geminiClient.js
                                           1. rebuild prompt (KB + history)
                                           2. enqueue
                                           3. Gemini API  (retry on 429)
                                               │
Browser  ◀──  { answer, matchedSections } ────┘
  (appends answer to local history[])

Every request is self-contained. Nothing is stored server-side.


🚀 Deploy to Render (30 seconds)

  1. Push this repo to GitHub.
  2. render.com → NewBlueprint → point at repo root.
  3. Enter your GEMINI_API_KEY when prompted.
  4. Click Create. Done.

Local dev

cp .env.example .env
npm install
npm start          # http://localhost:3000
npm run test       # unit tests (+ live calls if key is set)

🔧 What changed vs. the original

Area Before Render version
Session storage Server Map Browser history[]
POST body { message } + session header { message, history }
DELETE endpoint Cleared session Removed
Rate limiting None FIFO queue + back-off
Deploy Manual render.yaml Blueprint
knowledgeBase.js Unchanged
Frontend CSS Unchanged

🔄 Upgrade signals

Signal Action
> 500 DAU Switch to Gemini paid tier
Render feels sluggish Upgrade to Starter ($7/mo) — no spin-down
KB > 500 K tokens Add vector DB layer
Chat history > 50 turns regularly Summarise older turns before sending

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors