Skip to content

Latest commit

Β 

History

History
1110 lines (882 loc) Β· 54.7 KB

File metadata and controls

1110 lines (882 loc) Β· 54.7 KB

GS360 Γ— DeepTutor β€” Final Implementation Plan (v3)

Mission: The first open-source, plug-and-play, AI-powered UPSC learning platform. Built for Desktop & Web. Free forever. Anyone can add content. Anyone can fork it for any exam.

This plan supersedes all prior versions. It merges the original architecture, senior developer review (12 points), and all revisions into a single source of truth.


Table of Contents

  1. Why This Will Succeed
  2. Architecture Overview
  3. Content Pack System
  4. All 20 Gaps β€” Resolved
  5. All 12 Assumptions β€” Mitigated
  6. All 6 Decisions β€” Resolved
  7. POC Tests & Eval Harness
  8. Cold-Start Content Strategy
  9. Integration Reality Check
  10. Team & Capacity Model
  11. 16-Week Timeline
  12. Post-v1 Roadmap (Phase 5)
  13. Cost Model
  14. Operational Runbook

πŸ† Why This Will Succeed as Open Source

The Core Thesis

There is no free, AI-powered, open-source UPSC preparation tool. 15 lakh+ aspirants sit for UPSC every year. The market is served by β‚Ή50K–₹2L coaching packages and β‚Ή5K–₹15K app subscriptions. An open-source alternative with AI + community content will spread like wildfire.

8 Structural Advantages

# Advantage Why It Works
1 Zero-Cost Operation Gemini free tier (1,500 req/day) + Vercel free + Supabase free = β‚Ή0/month at <50 users. No VC funding needed. Students trust it because there's no business model to corrupt it.
2 200K Lines of Code β€” Free DeepTutor (16.5K stars, Apache-2.0) gives us RAG, quiz generation, TutorBots, memory, CLI, multi-channel agents β€” all production-tested. We build a UPSC skin, not an AI engine.
3 Community Builds The Product Content packs = the moat. Every student who adds PYQs, notes, or flashcards makes the platform better for everyone. Wikipedia model for UPSC prep. Community contributes AFTER v1 launches with self-authored seed content.
4 Network Effects More content β†’ better AI answers β†’ more students β†’ more content contributed β†’ stronger RAG β†’ better quiz generation β†’ cycle accelerates.
5 Fork-Ready = Unstoppable GATE, SSC, State PSC, NEET β€” fork, swap content packs, change branding. The engine doesn't care about the exam.
6 Self-Hosted = No Lock-In Students own their data. No server dependency. No "company shutting down" risk. Open source = permanent.
7 India-Specific Timing India has the world's largest competitive exam ecosystem (5Cr+ aspirants/year combined). India-specific open-source AI tools are nearly zero.
8 No Competitor Can Match Free + Open + AI Unacademy (β‚Ή10K+/yr), Testbook (β‚Ή5K+/yr) β€” all closed-source, subscription-based, no AI tutoring. Free + AI-native is a different category.

Competitive Landscape

Platform Price AI-Powered Open Source Offline Community Content Self-Hosted
Unacademy β‚Ή10K–₹60K/yr ❌ ❌ ❌ ❌ ❌
Testbook β‚Ή5K–₹15K/yr ❌ Basic ❌ βœ… ❌ ❌
BYJU's β‚Ή30K–₹1.5L ❌ ❌ βœ… ❌ ❌
Khan Academy Free ❌ ❌ ❌ ❌ ❌
Free YouTube/Telegram Free ❌ N/A ❌ βœ… Informal N/A
GS360 Open Source Free βœ… Full AI βœ… Yes βœ… PWA βœ… Plug-and-Play βœ… Yes

The Community Flywheel

graph LR
    A["Student joins GS360"] --> B["Uses AI features for free"]
    B --> C["Studies with content packs"]
    C --> D["Creates their own notes/MCQs"]
    D --> E["Submits as content pack PR"]
    E --> F["Community validates + merges"]
    F --> G["RAG knowledge base grows"]
    G --> H["AI answers get better"]
    H --> I["Word spreads to more students"]
    I --> A

    style A fill:#22C55E,color:#000
    style G fill:#3B82F6,color:#fff
    style I fill:#DC3545,color:#fff
Loading

πŸ“ Architecture Overview

graph TB
    subgraph "Frontend β€” GS360 UI"
        A["Next.js App Shell"] --> B["GS360 Design System"]
        B --> C["Daily Command Center"]
        B --> D["AI Notes / Chat"]
        B --> E["Notes & Materials"]
        B --> F["Testing / Quiz"]
        B --> G["Performance Dashboard"]
        B --> H["Plan View"]
    end

    subgraph "Auth & Multi-Tenancy Layer"
        AUTH["NextAuth.js β€” Google/GitHub OAuth"]
        SEC["Security Middleware β€” Path Validation + Audit Log"]
        MT["Per-User Namespace Manager"]
        RL["Token-Bucket Rate Limiter"]
    end

    subgraph "Backend β€” DeepTutor Engine"
        N["FastAPI Server"] --> O["RAG Pipeline"]
        N --> P["Chat / Deep Solve / Quiz Gen"]
        N --> Q["Knowledge Base Manager"]
        N --> R["TutorBot Agent System"]
        N --> S["Persistent Memory"]
    end

    subgraph "Plug-and-Play Content Layer"
        PP1["content-packs/ (Git-tracked)"]
        PP2["Content Registry β€” manifest.json"]
        PP3["Community Content Hub β€” GitHub"]
        PP1 --> PP2
        PP3 -->|"PR + Review"| PP1
    end

    subgraph "Data Layer"
        V[("Knowledge Bases β€” Per User")]
        W[("User Data / Sessions")]
        X[("Embeddings β€” Vector Store")]
        BK[("Daily Backup β€” R2/S3")]
    end

    subgraph "External Services"
        Y["LLM Provider β€” Gemini β†’ DeepSeek β†’ Ollama Fallback"]
        Z["Embedding Provider"]
        AA["Search Provider β€” SearXNG Self-Hosted"]
    end

    subgraph "Evaluation & Monitoring"
        EVAL["RAG Eval Harness β€” Continuous"]
        COST["LLM Cost Monitor + Alerts"]
        HEALTH["System Health Dashboard"]
    end

    A <-->|"WebSocket + REST"| AUTH
    AUTH --> SEC
    SEC --> MT
    MT --> N
    RL --> N
    PP1 -->|"Auto-ingest on boot"| Q
    N --> V
    N --> W
    N --> X
    W --> BK
    N --> Y
    N --> Z
    N --> AA
    N --> EVAL
    COST --> Y
Loading

Note

v1 scope exclusions: Voice Bot (Siri Orb), AI Cowork Studio, and Study Mode Focus Timer are deferred to v1.1. This is a deliberate scope-discipline decision, not technical limitation.


πŸ”Œ Plug-and-Play Content System

Directory Structure

gs360-live/
β”œβ”€β”€ content-packs/                    # ALL content lives here
β”‚   β”œβ”€β”€ registry.json                 # Master manifest β€” lists all packs
β”‚   β”‚
β”‚   β”œβ”€β”€ upsc-polity/                  # One folder = one content pack
β”‚   β”‚   β”œβ”€β”€ pack.json                 # Pack metadata
β”‚   β”‚   β”œβ”€β”€ documents/                # Raw source materials (PDF, MD, TXT)
β”‚   β”‚   β”œβ”€β”€ questions/                # MCQ + Mains question banks (JSON)
β”‚   β”‚   β”œβ”€β”€ notes/                    # Pre-made study notes (Markdown)
β”‚   β”‚   β”œβ”€β”€ prompts/                  # TutorBot personas (Markdown)
β”‚   β”‚   β”œβ”€β”€ flashcards/               # Spaced repetition cards (JSON)
β”‚   β”‚   └── cache/                    # Pre-generated AI outputs (fallback)
β”‚   β”‚
β”‚   β”œβ”€β”€ upsc-economy/
β”‚   β”œβ”€β”€ upsc-history/
β”‚   β”œβ”€β”€ upsc-current-affairs-apr-2026/
β”‚   └── gate-cse/                     # Non-UPSC β€” fork-ready
β”‚
β”œβ”€β”€ private-vault/                    # User's PRIVATE content (not in Git)
β”‚   β”œβ”€β”€ uploads/
β”‚   β”œβ”€β”€ video-transcripts/
β”‚   └── custom-kb/
β”‚
β”œβ”€β”€ eval/                             # Evaluation datasets
β”‚   β”œβ”€β”€ golden-dataset.json           # 200 UPSC questions with verified answers
β”‚   β”œβ”€β”€ eval-results/                 # Weekly eval run outputs
β”‚   └── eval-runner.py                # Automated eval script
β”‚
β”œβ”€β”€ templates/                        # Pack creation templates
β”‚   β”œβ”€β”€ pack-template/
β”‚   └── CONTENT_GUIDE.md
β”‚
└── scripts/
    β”œβ”€β”€ ingest-packs.py               # Auto-ingest into DeepTutor KBs
    β”œβ”€β”€ validate-pack.py              # Schema + quality validation
    β”œβ”€β”€ export-pack.py                # Export KB back to pack format
    └── cost-monitor.py               # LLM usage tracking + alerts

Key Schemas

  • pack.json β€” Declares metadata, exam target, subject, language, content counts. The ingestion script reads this to route content.
  • questions/*.json β€” id, year, question, options[], correct, explanation, difficulty, topics[], source, contributor. Feeds quiz engine + RAG context.
  • flashcards/*.json β€” front, back, difficulty, topic for spaced repetition.

Contribution Workflow

Contributor (no code needed)              Automated Pipeline
─────────────────────────────             ──────────────────
1. Fork repo
2. Copy templates/pack-template/
3. Add PDFs to documents/
4. Add MCQs to questions/*.json
   (or use web form β†’ auto-generates JSON)
5. Edit pack.json metadata
6. Submit PR                              β†’ GitHub Actions triggers:
                                            βœ“ validate-pack.py (schema)
                                            βœ“ Question format check
                                            βœ“ Copyright scan (hash + text fingerprint)
                                            βœ“ Duplicate detection (embedding similarity)
                                            βœ“ LLM quality scorer
                                          β†’ 2 community reviewers approve
                                          β†’ Auto-merge β†’ Auto-ingest

βœ… All 20 Gaps β€” Resolved

πŸ”΄ G1: Authentication System

Solution: Dual-Mode Auth

Mode Auth Use Case
Self-Hosted (Single User) AUTH_MODE=none β€” no auth needed. Works like stock DeepTutor. Student running locally
Hosted / Multi-User AUTH_MODE=multi β€” NextAuth.js with Google OAuth, GitHub OAuth, Email magic link (Resend free tier: 3K emails/mo). Session stored in Supabase free tier. Shared hosted platform

Effort: 1 day. NextAuth.js is drop-in for Next.js. Supabase adapter exists.


πŸ”΄ G2: Multi-Tenancy β€” Security-Hardened

Caution

This is the #1 technical risk. The UserNamespace class is the easy part. The hard part is tracing every hardcoded path assumption across DeepTutor's 16K+ lines. This is debugging work, not codegen. Phase 0 Spike validates feasibility before committing.

Solution: Per-User Namespace Manager + Security Middleware

# middleware/namespace.py
class UserNamespace:
    """Routes all DeepTutor file operations to user-specific directories."""

    VALID_USER_ID = re.compile(r'^[a-zA-Z0-9_-]{1,64}$')

    def __init__(self, user_id: str):
        if not self.VALID_USER_ID.match(user_id):
            raise ValueError(f"Invalid user_id: {user_id}")
        self.user_id = user_id
        self.base = os.path.realpath(f"data/users/{user_id}")
        if not self.base.startswith(os.path.realpath("data/users/")):
            raise PermissionError(f"Path traversal attempt: {user_id}")
# middleware/security.py
class SecurityMiddleware:
    """Request-level security enforcement. Every API call passes through this."""

    def validate_request(self, request, jwt_claims: dict):
        user_id = jwt_claims["sub"]  # From JWT, NEVER from request body
        namespace = UserNamespace(user_id)
        audit_logger.info(f"ACCESS user={user_id} endpoint={request.path}")
        return namespace

    def validate_file_path(self, namespace, requested_path: str):
        real_path = os.path.realpath(requested_path)
        user_base = os.path.realpath(namespace.base)
        shared_base = os.path.realpath("data/knowledge_bases/")

        if real_path.startswith(user_base) or real_path.startswith(shared_base):
            return True
        audit_logger.warning(f"BLOCKED user={namespace.user_id} path={requested_path}")
        raise PermissionError("Access denied: path outside your namespace")

Data Isolation:

data/
β”œβ”€β”€ knowledge_bases/           # SHARED β€” content packs (read-only for users)
β”œβ”€β”€ users/                     # ISOLATED β€” per-user data
β”‚   β”œβ”€β”€ user_abc123/
β”‚   β”‚   β”œβ”€β”€ memory/            # Learner profile
β”‚   β”‚   β”œβ”€β”€ sessions/          # Chat history
β”‚   β”‚   β”œβ”€β”€ notebooks/         # Saved notes
β”‚   β”‚   └── knowledge_bases/   # Private uploads (physically separate vector index)
β”‚   └── user_def456/
β”‚       └── ...

Security Requirements (Non-Negotiable for v1):

Requirement Implementation
Path traversal prevention os.path.realpath() + prefix validation on every file access
Index isolation User private KBs use physically separate vector stores, NOT filtered views
Request-level auth user_id from JWT sub claim, NEVER from request body
Audit logging Every file access logged with user_id, path, timestamp
WebSocket isolation Each WS connection authenticated + bound to single user namespace
Pre-launch pen test 10 common attack vectors (path traversal, IDOR, session fixation, KB cross-contamination)

Key Design: Content packs (community knowledge) = shared read-only. User data (notes, scores, memory) = fully isolated with physically separate vector indices. RAG queries merge both at query-time, never at index-time.

Effort: 5–7 days realistic.

  • 2 days patching DeepTutor's file path resolution
  • 2 days for LlamaIndex per-user vector store pooling
  • 1–2 days for WebSocket session isolation
  • 1 day for security middleware + audit logging

πŸ”΄ G3: Desktop-Optimized Premium UI

Decision: Desktop-first. No mobile app. Desktop/web optimized for deep research & note-taking.

.app {
  display: grid;
  grid-template-columns: 240px 1fr; /* Fixed sidebar */
  height: 100vh;
}

@media (max-width: 1024px) {
  .app { grid-template-columns: 80px 1fr; } /* Collapsed sidebar */
}

@media (max-width: 768px) {
  .app { grid-template-columns: 1fr; }
  .sidebar { display: none; } /* Hamburger menu on mobile */
}

Validation: Post-launch, Plausible analytics tracks device type. If >60% of traffic is mobile after 30 days, reconsider in v1.1 with data, not assumptions.

Effort: 1 day.


πŸ”΄ G4: Offline Support (PWA)

Solution: Progressive Web App + Offline Quiz

Feature Offline? How
Quiz (from content packs) βœ… Full Question banks cached in IndexedDB
Flashcard revision βœ… Full Cached locally
Read/write notes βœ… Full IndexedDB, synced on reconnect
Study timer βœ… Full Client-side only
AI Chat / Notes Gen ❌ No Shows "Connect to internet for AI features"
Upload materials ⚠️ Queued Saved locally, uploaded when online

Effort: 3 days.


πŸ”΄ G5: Rate Limiting + LLM Fallback

Solution: Token-Bucket Rate Limiter + Multi-Provider Fallback Chain

Request comes in
    ↓
1. Try Gemini Flash 2.0 (free, fast)
    ↓ if rate-limited or down
2. Try DeepSeek V3 ($0.14/M tokens β€” ultra cheap backup)
    ↓ if rate-limited or down
3. Try Ollama local (if self-hosted with GPU)
    ↓ if unavailable
4. Serve pre-generated cached response (from content pack cache/)
    ↓ if nothing cached
5. Show "AI quota reached β€” try again in X minutes" + offer offline quiz

Per-user limits (free tier): 20 AI requests/hour, 100/day, 10 quiz generations/day, 3 deep research/day.

Effort: 3 days.


πŸ”΄ G6: Copyright Protection β€” 3-Layer System

Layer 1: Automated Scan (on PR / upload)
─────────────────────────────────────────
β€’ SHA-256 hash check against known copyrighted PDFs
β€’ Filename pattern matching ("Laxmikanth*.pdf", etc.)
β€’ File size threshold (>5MB PDF flagged)
β€’ PDF metadata extraction (author/publisher fields)
β€’ Text fingerprinting β€” extract 10 random pages, compute n-gram
  signatures against known textbook corpus
β€’ Paragraph-level similarity against reference corpus (~500 paragraphs)

Layer 2: Community Review (on PR)
──────────────────────────────────
β€’ 2 reviewer approvals required
β€’ PR template checklist: original / public domain / PYQ / fair use

Layer 3: DMCA Process (post-publish)
─────────────────────────────────────
β€’ DMCA.md in repo root with takedown instructions
β€’ Email: dmca@gs360.study
β€’ Response SLA: 48 hours
β€’ Auto-remove on valid claim, reinstate on counter-notice

Note

Layer 1 will never be bulletproof β€” even YouTube can't detect copyright reliably. The 3-layer approach is the industry standard used by GitHub, Wikipedia, and Internet Archive.

Allowed: NCERT, UPSC PYQs, Constitution, PIB, Economic Survey, Budget docs, original notes. Not allowed: Full copyrighted textbooks, coaching material, scanned paid test series.

Effort: 2 days.


πŸ”΄ G7: Data Backup

Solution: Daily automated backup to Cloudflare R2 (free: 10GB, 1M ops/month).

  • Frequency: Daily at 2:00 AM IST
  • What: data/users/ + data/knowledge_bases/ (content packs already in git)
  • Retention: 7 daily + 4 weekly snapshots
  • Encryption: AES-256 at rest
  • User-side: "Export My Data" button downloads zip with all personal data

Effort: Half day.


🟑 G8: i18n β†’ Deferred to v1.1

v1 launches English-only. next-intl framework setup in v1.1. Community translates via same PR process as content packs.


🟑 G9: Accessibility

Applied during Phase 2 design:

  • aria-label on all interactive elements
  • Keyboard navigation (Tab, Enter)
  • :focus-visible outlines
  • Color contrast β‰₯ 4.5:1 (WCAG AA)
  • Semantic HTML (<nav>, <main>, <aside>)
  • aria-live="polite" on quiz timer
  • Enforced by ESLint jsx-a11y plugin

Effort: 1 day during design phase.


🟑 G10: Analytics β†’ Post-Launch (Week 15)

Plausible Analytics (self-hosted, free, privacy-respecting). Tracks: page views, content pack usage, quiz completion rates, geography, device type (validates desktop-first decision).

Does NOT track: personal identity, study content, AI conversations.

Effort: 2 hours.


🟑 G11: Content Deduplication

Embedding similarity check (>0.92 threshold) during CI before merge. Blocks duplicate questions.

Effort: Half day.


🟑 G12: Syllabus Change Handling

Syllabus version tracked in registry.json. Outdated packs flagged. Config-only.


🟑 G13: Load Testing

k6 script ships with repo. Ramp to 50 concurrent users, hold 5 min. Documented limits: free tier handles ~30–50 concurrent users.

Effort: Half day.


🟑 G14: SEO

Public pages (landing, PYQ database, CA summaries) are SSR via Next.js. Meta tags, structured data, sitemap. Monthly CA packs auto-publish as blog posts.

Effort: 1 day.


🟑 G15: Data Export

Settings β†’ "Export My Data" β†’ zip with profile.json, notes/, quiz-history.json, flashcard-progress.json, sessions/, README.md. Import also supported.

Effort: 1 day.


🟒 G16–G20: Deferred

Gap Solution When
G16: Real-time collaboration WebSocket rooms via Partykit v2
G17: Plagiarism detection Embedding similarity against answer corpus v2
G18: Native mobile app Non-goal. Desktop/web only. Validated post-launch. v2
G19: UPSC model fine-tuning Fine-tune Qwen2.5-7B on PYQ explanations v3
G20: Admin panel /admin with moderation queue, user stats v1.1

πŸ“‹ All 12 Assumptions β€” Mitigated

# Assumption Mitigation
A1 DeepTutor stays stable Pin to v1.0.2. Soft fork (overlay, don't modify core). Apache-2.0 = we can continue independently if they pivot.
A2 Gemini free tier persists Multi-provider fallback chain. Worst case: DeepSeek at $0.14/M tokens β†’ ~β‚Ή500/month for 10K users.
A3 Students have internet PWA with offline quiz, notes, flashcards. Core study flow works offline.
A4 NCERT is redistributable Link to official NCERT portal. Extract only summaries + key concepts under fair use.
A5 Community contributes Do NOT rely on community for v1 content. Self-author all seed content from public sources. Community contributes AFTER platform has traction.
A6 Desktop-first is OK Defensible (Notion, Obsidian, Anki all desktop-first). Validate post-launch with analytics.
A7 Self-hosted primary Ship both: self-hosted (default) + hosted demo at gs360.study (rate-limited).
A8 LLMs won't hallucinate RAG-grounded only. System prompt enforces "refuse if not in context." Source attribution mandatory. Validated continuously via eval harness.
A9 Web Speech API for Hindi Voice bot deferred to v1.1. Text-only for v1.
A10 Vector store scales LlamaIndex local storage. Monitor at 10K pages. Migrate to Qdrant if slow. Content packs enable sharding by subject.
A11 Students can run Docker "Deploy to Railway" one-click button. YouTube walkthrough in Hindi. Hosted version for everyone else.
A12 Non-devs can write JSON Web form at /contribute β†’ auto-generates JSON β†’ auto-submits PR. Zero JSON knowledge needed.

πŸ”‘ All 6 Decisions β€” Resolved

Decision Resolution Rationale
D1: Hosted vs Self-Hosted Both. Self-hosted default + hosted demo (rate-limited). Open-source promise + accessibility.
D2: Platform Priority Desktop-first. Validated post-launch. Deep study β†’ large screens. Directional, not irreversible.
D3: Contribution Method Web form (primary) + GitHub PRs (advanced). /contribute β†’ auto JSON β†’ auto PR.
D4: Fork Strategy Soft fork. Changes in gs360/ overlay. Core DeepTutor untouched. Pull upstream cleanly.
D5: AI Grounding RAG-grounded with citations. Continuously evaluated. Prevents hallucination. Community verifies via citations.
D6: Exam Scope UPSC-first, multi-exam ready. Config-driven branding + pack system supports any exam.

πŸ§ͺ POC Tests & RAG Accuracy De-Risking Strategy

# Test Method Pass Criteria Blocking?
T1 Quiz quality Upload 50 PYQs β†’ generate 20 MCQs β†’ 3 aspirants rate 1–5 Avg β‰₯ 3.5/5 Yes
T2 Gemini throughput k6: 50 users Γ— 10 req/min for 30min <5% rate-limit errors with fallback active Yes
T3 RAG accuracy Continuous harness β€” micro set (30 Qs) from Week 3, scaling to 200 by Week 12. Per-category scoring. See per-category targets below Yes (see launch thresholds)
T4 Pack ingestion speed 10K pages across 8 packs < 30 min on 4GB RAM No
T5 Hindi speech Deferred to v1.1 (voice bot cut from v1) β€” Deferred
T6 Desktop usability Chrome, Firefox, Safari at 1920Γ—1080 and 1366Γ—768 All core flows completable, premium feel Yes

T3: RAG Accuracy β€” 6-Step De-Risking Strategy

Important

RAG accuracy is the core value proposition. If the AI gives wrong answers about Article 370 or the 73rd Amendment, the platform is worse than useless. This section treats accuracy as an engineering discipline, not a checkbox.

Step 1: Move Eval Earlier (Week 3, Not Week 12)

Build the minimal eval harness alongside the quiz engine in Week 3, not after content generation in Week 12. This gives 9 weeks of tuning runway instead of discovering problems with days left.

Week 2, Day 5:  First content pack ingested + queryable via RAG  ← already in plan
Week 3, Day 1:  Hand-curate 30-question micro golden set
Week 3, Day 2:  Run baseline eval on stock pipeline β†’ FIRST ACCURACY READING
Week 3–10:      One tuning lever per week alongside feature work
Week 12:        Scale to full 200-question golden set + domain expert review

Micro golden set composition (30 questions):

  • 12 factual recall ("Which article of the Constitution deals with...")
  • 9 comprehension ("Explain the significance of...")
  • 6 analytical ("Examine the role of..." / "Critically analyze...")
  • 3 current affairs ("Discuss the implications of Budget 2026...")

Baseline test: Run stock pipeline (LlamaIndex + Gemini Flash + default 512-token chunking) on 1 sample content pack. No tuning. Just measure where we start.

Step 2: 6 Tuning Levers β€” Prioritized by Expected Impact

Warning

One lever at a time. Running 2+ simultaneously makes attribution impossible. Each lever gets an A/B eval run before/after. Results logged in eval/changelog.md.

Priority Lever Expected Gain What to Test When
1 Chunking strategy 5–15% 4 strategies: default 512-token, semantic (split on headers), hierarchical (parent + child nodes), question-aware. Pick winner via eval delta. Week 3–4
2 Retrieval improvements 5–10% Hybrid search (BM25 + vector via QueryFusionRetriever), tune Top-K (3/5/10), add bge-reranker-base reranking top-20 β†’ top-5. Week 5–6
3 Prompt engineering 5–10% on hallucination Force "answer ONLY from provided context", add few-shot UPSC examples, force citation format, separate prompts for factual vs analytical questions. Week 6–7
4 Embedding model 3–7% A/B test Gemini text-embedding-004 vs BAAI/bge-large-en-v1.5 vs nomic-embed-text-v1.5 on 1 pack. Week 7–8
5 Query rewriting 3–5% on analytical HyDE pattern: LLM rewrites question into 2–3 retrieval-friendly variants, retrieve union. Helps with "Examine the role of..." style queries. Week 8–9
6 Answer model Variable Test Gemini 2.5 Pro for synthesis (keep Flash for retrieval). Compare DeepSeek V3 on analytical questions only. Week 9–10

Why this order: Chunking changes what the LLM sees β€” it's the highest-leverage lever. Prompt engineering changes how it reasons. Model swaps are lowest-leverage because they're expensive and the delta is often smaller than chunking.

Step 3: Per-Category Accuracy Targets

Blended accuracy is a vanity metric. A 65% blended score could hide 90% factual + 20% analytical β€” that's a terrible product. Score per category:

Category % of Golden Set Day-1 Target Week 12 Target Why This Target
Factual recall 40% 75% 90% Direct retrieval. If chunking is right, this should be high.
Comprehension 30% 65% 80% Needs multi-chunk synthesis. Harder but tractable.
Analytical 20% 45% 65% UPSC analytical Qs are genuinely hard for RAG β€” retrieval finds right topic, wrong framing. 45% Day-1 is honest.
Current affairs 10% 55% 75% Depends on CA content pack freshness. Floor is lower.

Why 45% Day-1 for analytical is OK: Failing analytical doesn't tank the blended score (it's 20% of the set). And honestly, even human UPSC aspirants don't ace analytical questions β€” they're designed to be hard. A 45%β†’65% improvement arc over 9 weeks is achievable via query rewriting + prompt engineering.

Step 4: Eval Discipline

Weekly cadence:
  1. Run eval harness (automated via GitHub Actions)
  2. Review per-category scores
  3. Check for regressions (β‰₯7% drop in any category on 30-Q set,
     β‰₯3% on 200-Q set β€” adjusted for statistical significance)
  4. Pick 1 tuning lever
  5. A/B test: run eval with and without the change
  6. If delta positive β†’ merge. If neutral or negative β†’ revert.
  7. Log in eval/changelog.md with before/after %

eval/changelog.md format:

## Week 5 β€” Hybrid Search (BM25 + Vector)
- Change: Added BM25 to QueryFusionRetriever, top-K=5
- Factual: 72% β†’ 78% (+6%) βœ…
- Comprehension: 60% β†’ 63% (+3%) βœ…
- Analytical: 42% β†’ 44% (+2%) βœ…
- Hallucination: 8% β†’ 6% (-2%) βœ…
- Verdict: MERGED

Hard rule: No prompt or chunking change ships without an eval delta. No vibes-based tuning.

Note

Statistical significance on small sets: 3% on a 30-question set = 1 question flipping β€” that's noise. Use β‰₯7% (2+ questions) as the regression threshold on the micro set. When scaling to 200 questions, 3% (6 questions) becomes meaningful.

Step 5: Pre-Decided Launch Thresholds

Decide these NOW, not on launch day when motivated reasoning kicks in:

Blended Accuracy Action
<55% πŸ›‘ Block launch. Extend Phase 3. Revisit chunking + embedding fundamentally.
55–65% ⚠️ Launch with caveats: per-answer confidence indicator (green/yellow/red based on retrieval score) + disclaimer banner on AI features. Set v1.1 accuracy target.
β‰₯65% βœ… Launch as planned.
β‰₯75% πŸš€ Pull launch forward if other gates (security, content) also pass.
Hallucination Rate Action
>10% πŸ›‘ Block launch regardless of accuracy. A confident wrong answer about Article 370 is worse than "I don't know."
5–10% ⚠️ Launch with mandatory confidence badges + "AI-generated, verify from source" disclaimer.
<5% βœ… Acceptable.

"Launch with caveats" means concretely:

  • Per-answer retrieval confidence badge (🟒 high / 🟑 medium / πŸ”΄ low) based on top-k similarity score
  • Banner on all AI features: "AI answers are generated from content packs and may contain errors. Always verify from original sources."
  • Low-confidence answers (πŸ”΄) include a "Flag this answer" button for community review

Step 6: Eval Budget Addition

Item Cost Notes
LLM API for eval runs (~10 weekly runs Γ— 200 queries) ~β‚Ή3,000–₹5,000 Gemini free tier covers most; overflow to DeepSeek
Domain expert micro-review of golden set (30β†’200 Qs) ~β‚Ή3,000 Verify answer keys are correct. Bad golden data = bad eval.
Total eval budget ~β‚Ή8,000 Added to v1 one-time costs

Eval Harness Code

# eval/eval_runner.py β€” Runs weekly via GitHub Actions

def run_eval():
    """Produce a per-category scorecard, not a blended pass/fail."""
    golden = load("eval/golden-dataset.json")

    results = {
        "date": now(),
        "scores": {
            "exact_match": 0,
            "partial_correct": 0,
            "hallucination": 0,
            "no_answer": 0,
            "wrong_refusal": 0,
        },
        "per_category": {
            "factual": {"correct": 0, "total": 0},
            "comprehension": {"correct": 0, "total": 0},
            "analytical": {"correct": 0, "total": 0},
            "current_affairs": {"correct": 0, "total": 0},
        },
        "low_confidence": [],
    }

    for q in golden:
        response = query_rag(q["question"])
        score = evaluate_response(response, q["verified_answer"])
        results["scores"][score.category] += 1
        results["per_category"][q["type"]]["total"] += 1
        if score.is_correct:
            results["per_category"][q["type"]]["correct"] += 1

    # Save timestamped results. Compare against previous week.
    # Alert if any category drops β‰₯7% (micro set) or β‰₯3% (full set).
    # Alert if hallucination rate exceeds 10%.

Golden Dataset (Phased Construction):

Phase When Size Source
Micro set Week 3 30 questions Hand-curated: 12 factual, 9 comprehension, 6 analytical, 3 CA
Full set Week 12 200 questions UPSC PYQ 2020–2025 (100) + NCERT chapter-end (50) + custom analytical (50)
Quarterly refresh Post-launch +20 questions New PYQs, updated CA, community-flagged edge cases

🧊 Cold-Start Content Strategy

Important

Community doesn't exist yet. Community contributes AFTER you have something worth contributing to. v1 content must be self-authored from public sources.

v1 Seed Content (Self-Authored)

Content Pack Source Volume Effort
upsc-pyq-2000-2025 UPSC official papers (public record) ~2,500 MCQs with explanations 3 days
upsc-polity NCERT + Constitution (public domain) ~200 MCQs, ~150 flashcards, ~50 notes 4 days
upsc-economy NCERT + Economic Survey + Budget ~200 MCQs, ~150 flashcards, ~50 notes 4 days
upsc-history NCERT Class 6–12 History ~200 MCQs, ~150 flashcards, ~50 notes 4 days
upsc-geography NCERT + India Year Book ~150 MCQs, ~100 flashcards, ~40 notes 4 days
upsc-ethics PYQ case studies + Constitution ~100 MCQs, ~80 flashcards, ~30 notes 2 days
upsc-current-affairs-2026 PIB + Economic Survey + Budget 2026 ~100 flashcards, ~50 MCQs 2 days
upsc-science-tech NCERT Science + PIB S&T ~100 MCQs, ~60 flashcards, ~30 notes 2 days

Total seed content: ~1,500 MCQs, ~840 flashcards, ~300 study notes, ~2,500 PYQs.

Domain Expert Review

  • Budget: β‚Ή15,000–₹20,000 (freelance, 2 weeks part-time)
  • Reviews: Factual accuracy, UPSC-relevance, note quality, copyright flags
  • Where to find: LinkedIn, UPSC Telegram groups, Internshala, Pepper Content

Post-Launch Community Sequence

v1 Launch (self-authored content)
    β†’ Students use, find value
    β†’ Analytics prove usage
    β†’ v1.1: Enable /contribute form
    β†’ Gamify: badges, leaderboard
    β†’ Partner with UPSC Telegram groups (100K+ member groups)
    β†’ Monthly content drives
    β†’ Network effects kick in

πŸ›‘οΈ Integration Reality Check

Warning

Five runtimes (Python FastAPI, Node.js/Next.js, LlamaIndex, external LLM APIs, auth layer) means integration pain is guaranteed, not possible.

Known Integration Pain Points

Integration What Will Break Mitigation Buffer
Next.js ↔ FastAPI CORS, cookie/session passing, SSR vs client fetch Shared API_URL env var. Next.js API routes as proxy. CORS middleware with explicit origin whitelist. 2 days
NextAuth.js ↔ FastAPI JWT format mismatch, session validation, token refresh FastAPI validates NextAuth JWT with shared secret. Test: expired, malformed, missing tokens. 1 day
LlamaIndex ↔ Gemini Rate limit responses unhandled, embedding timeouts Wrap all calls in try/except with fallback. Pin model version. Circuit breaker pattern. 2 days
WebSocket ↔ Auth WS doesn't carry cookies like HTTP. Token expiry mid-session. Auth on WS handshake via query param token. Re-auth on reconnect. 1 day
Docker Compose Startup order, health checks, volume mounting, memory limits depends_on with health checks. Test Windows + Linux. Minimum: 4GB RAM. 1 day

Total integration buffer: 7 days distributed across Phase 3.

Pre-Integration Checklist

Before starting any new integration:
  β–‘ Both services start independently and respond to health checks
  β–‘ Auth token format documented and agreed by both sides
  β–‘ Error response format standardized (JSON, consistent schema)
  β–‘ Timeout values set (30s for LLM, 5s for everything else)
  β–‘ One happy-path e2e test passes
  β–‘ One error-path test exists (timeout, invalid token)

πŸ‘₯ Team & Capacity Model

Role Who Hours/Week Notes
Lead Developer You (primary) 25–30 productive hrs 5 hrs/day Γ— 5–6 days. Includes review, debugging, deployment.
AI Codegen (Opus 4.6) Assisted development N/A Boilerplate, tests, schemas. Does NOT: debug integration, trace upstream paths, make security decisions.
Domain Expert Freelance (β‚Ή15–20K) 10–15 hrs/week Reviews AI-generated content. Phase 3 only.
DMCA Handler You (initially) 1 hr/week Near-zero volume at launch.

Capacity Math

v1 scope: ~55 working days of effort
Lead developer: 5 hrs/day Γ— 5.5 days/week = ~27.5 hrs/week
Effective dev weeks: 55 days Γ· 5.5 = 10 work-weeks
Calendar weeks (with buffer): 16 weeks

AI codegen reduces boilerplate writing ~40%
AI does NOT reduce: integration debugging, security review, testing, deployment

What AI Can and Cannot Do

AI CAN reliably generate AI CANNOT reliably do
NextAuth.js config boilerplate Debug why LlamaIndex returns wrong user's data
Rate limiter middleware Trace hardcoded paths across 16K lines of upstream code
Service worker skeleton Decide if a vector index should be shared or isolated
CSS theme / design system Test WebSocket auth edge cases
Pack schema validation scripts Evaluate if an AI UPSC answer is factually correct
Backup pipeline scripts Determine the right chunking strategy
CI/CD workflows Negotiate partnerships with Telegram groups
Test scaffolding Make security architecture decisions under ambiguity

Rule: Use AI for code generation. Use humans for judgment, debugging, and integration.


πŸ“… 16-Week Timeline

Scope Discipline Rules

1. v1 has EXACTLY ONE GOAL: "A student can use AI to study UPSC content
   packs and take quizzes." Everything else is v1.1+.
2. Feature freeze at Week 10. Weeks 11–16 are testing, content, debugging,
   and launch.
3. Every feature request gets a "What breaks if we don't ship this in v1?"
   test. If "nothing critical," it's v1.1.
4. Track on public GitHub project board.
5. Cheap codegen β‰  cheap integration + testing + deployment. Resist scope creep.

v1 Feature Scope (What Ships)

Feature Priority Why
Auth (dual-mode) P0 Multi-user doesn't work without it
Multi-tenancy (security-hardened) P0 Data isolation is non-negotiable
Content pack ingestion + registry P0 This IS the product
AI Notes (RAG-powered) P0 Core differentiator
Quiz engine (content pack + AI-generated) P0 Most tangible student value
Rate limiter + LLM fallback chain P0 Platform dies without this
Daily Command Center UI P0 The interface students see
PWA + offline quiz P1 Tier-2/3 access

v1.1 Scope (Deferred β€” Ships 4–6 Weeks After v1)

Feature Why Cut
Voice bot (Siri Orb) Complex. Entire week for a nice-to-have.
AI Cowork Studio Advanced. Students need basic RAG chat first.
Study Mode focus timer Pure frontend, not core.
i18n (Hindi, Tamil, etc.) English-only for v1.
Plausible analytics 2-hour setup. Add Week 15 or post-launch.
CA engine automation Manual publishing as content packs for v1.
Admin panel Not needed at <100 users.

Week-by-Week

╔══════════════════════════════════════════════════════════════════╗
β•‘  PHASE 0: SPIKE & FOUNDATION (Week 1–2)                        β•‘
╠══════════════════════════════════════════════════════════════════╣
β•‘                                                                  β•‘
β•‘  Week 1: Multi-Tenancy Spike (BLOCKING)                         β•‘
β•‘  ─────────────────────────────────────                          β•‘
β•‘  Day 1-2: Get DeepTutor running locally (Docker, Python env,    β•‘
β•‘           LlamaIndex setup, verify all features work)            β•‘
β•‘  Day 3:   Trace every file path reference in DeepTutor core     β•‘
β•‘           (grep + manual read). Produce PATH_AUDIT.md            β•‘
β•‘  Day 4:   Identify session-scoped vs global modules.             β•‘
β•‘           Prototype UserNamespace patching on 2 modules.         β•‘
β•‘  Day 5:   Build minimal 2-user test:                             β•‘
β•‘           User A uploads doc β†’ User B should NOT see it          β•‘
β•‘           User A chats β†’ User B's session is separate            β•‘
β•‘                                                                  β•‘
β•‘  β›” DECISION GATE: Does multi-tenancy work?                     β•‘
β•‘     YES β†’ Continue to Week 2                                     β•‘
β•‘     NO  β†’ Re-scope: self-hosted single-user only for v1.        β•‘
β•‘           Multi-tenant hosted version becomes v1.1.              β•‘
β•‘                                                                  β•‘
β•‘  Week 2: Auth + Content Pack Foundation                          β•‘
β•‘  ──────────────────────────────────────                          β•‘
β•‘  Day 1:   NextAuth.js setup (Google + GitHub OAuth)              β•‘
β•‘  Day 2:   Security middleware (path validation, audit logging)   β•‘
β•‘  Day 3-4: Content pack schema validation + ingest pipeline       β•‘
β•‘  Day 5:   First content pack ingested + queryable via RAG        β•‘
β•‘                                                                  β•‘
╠══════════════════════════════════════════════════════════════════╣
β•‘  PHASE 1: CORE ENGINE (Week 3–6)                                β•‘
╠══════════════════════════════════════════════════════════════════╣
β•‘                                                                  β•‘
β•‘  Week 3-4: AI Core Features + Eval Baseline                     β•‘
β•‘  ──────────────────────────────────────────                      β•‘
β•‘  β€’ Per-user namespace integration (all modules patched)          β•‘
β•‘  β€’ AI Notes β€” RAG-powered note generation from content packs    β•‘
β•‘  β€’ Quiz engine β€” generate from content packs + AI               β•‘
β•‘  β€’ Rate limiter + LLM fallback chain                            β•‘
β•‘  β€’ Week 3 Day 1-2: Build eval harness + curate 30-Q micro set   β•‘
β•‘  β€’ Week 3 Day 3: Run BASELINE eval on stock pipeline             β•‘
β•‘    β†’ FIRST ACCURACY READING (9 weeks of tuning runway)           β•‘
β•‘  β€’ Week 3–4: Test chunking strategies (Lever 1 β€” biggest gain)   β•‘
β•‘                                                                  β•‘
β•‘  Week 5-6: Platform Features + Retrieval Tuning                  β•‘
β•‘  ─────────────────────────────────────────────                    β•‘
β•‘  β€’ Content Pack Manager UI (browse, install, search packs)       β•‘
β•‘  β€’ Knowledge Base manager (user uploads, private vault)          β•‘
β•‘  β€’ Performance dashboard (quiz history, scores, progress)        β•‘
β•‘  β€’ PWA manifest + service worker + offline quiz                  β•‘
β•‘  β€’ Lever 2: Hybrid search + reranking (eval delta tracked)       β•‘
β•‘                                                                  β•‘
β•‘  End of Week 6: Run T1 (quiz quality) + T2 (throughput)          β•‘
β•‘  Weekly eval runs ongoing β€” accuracy tracked per category        β•‘
β•‘                                                                  β•‘
╠══════════════════════════════════════════════════════════════════╣
β•‘  PHASE 2: UX & DESIGN (Week 7–10)                              β•‘
╠══════════════════════════════════════════════════════════════════╣
β•‘                                                                  β•‘
β•‘  Week 7-8: GS360 Theme + Command Center + RAG Levers 3-4        β•‘
β•‘  ─────────────────────────────────────────────────────────        β•‘
β•‘  β€’ GS360 design system (CSS, components, dark theme)             β•‘
β•‘  β€’ Daily Command Center layout                                   β•‘
β•‘  β€’ Sidebar navigation + keyboard shortcuts                       β•‘
β•‘  β€’ Accessibility pass (WCAG AA, aria-labels, focus visible)      β•‘
β•‘  β€’ Lever 3: Prompt engineering (factual vs analytical prompts)   β•‘
β•‘  β€’ Lever 4: Embedding model A/B test (eval delta tracked)        β•‘
β•‘                                                                  β•‘
β•‘  Week 9-10: Polish + Integration Debugging + Final Levers        β•‘
β•‘  ────────────────────────────────────────────────────────         β•‘
β•‘  β€’ End-to-end integration testing (all 5 runtimes talking)       β•‘
β•‘  β€’ CORS, auth token passing, WebSocket fixes                     β•‘
β•‘  β€’ Error handling + loading states + offline UI states            β•‘
β•‘  β€’ Lever 5: Query rewriting (HyDE) + Lever 6: Answer model A/B   β•‘
β•‘  β€’ Run T6 (desktop/web usability)                                β•‘
β•‘                                                                  β•‘
β•‘  β›” FEATURE FREEZE AT END OF WEEK 10                            β•‘
β•‘     No new features after this point. Only bug fixes.            β•‘
β•‘     RAG tuning continues through Phase 3 (eval-driven only).    β•‘
β•‘                                                                  β•‘
╠══════════════════════════════════════════════════════════════════╣
β•‘  PHASE 3: CONTENT, EVAL, & HARDENING (Week 11–13)              β•‘
╠══════════════════════════════════════════════════════════════════╣
β•‘                                                                  β•‘
β•‘  Week 11: Cold-Start Content Creation                            β•‘
β•‘  ────────────────────────────────────                            β•‘
β•‘  β€’ AI-generate all 8 seed content packs                          β•‘
β•‘  β€’ Structure 2,500 PYQs (2000–2025)                             β•‘
β•‘  β€’ AI-generate NCERT summaries + MCQs + flashcards               β•‘
β•‘                                                                  β•‘
β•‘  Week 12: Domain Expert Review + Full Golden Set                 β•‘
β•‘  ────────────────────────────────────────────────                 β•‘
β•‘  β€’ Domain expert reviews AI content (β‚Ή15-20K budget)             β•‘
β•‘  β€’ Scale golden dataset: 30 β†’ 200 verified UPSC Q&A pairs       β•‘
β•‘  β€’ Domain expert micro-review of golden set answers (β‚Ή3K)        β•‘
β•‘  β€’ Run full T3 eval. Apply launch threshold decision:            β•‘
β•‘    <55% β†’ block launch | 55–65% β†’ caveats | β‰₯65% β†’ go           β•‘
β•‘  β€’ Hallucination >10% β†’ block launch regardless of accuracy      β•‘
β•‘                                                                  β•‘
β•‘  Week 13: Security + Bug Fixing Buffer                           β•‘
β•‘  ─────────────────────────────────────                           β•‘
β•‘  β€’ Pre-launch pen test (10 attack vectors)                       β•‘
β•‘  β€’ Integration bug fixing (7-day buffer)                         β•‘
β•‘  β€’ Copyright scan pipeline testing                                β•‘
β•‘  β€’ Backup pipeline verification (backup + restore test)          β•‘
β•‘  β€’ Load testing (k6 β€” T2 re-run with real content)               β•‘
β•‘                                                                  β•‘
╠══════════════════════════════════════════════════════════════════╣
β•‘  PHASE 4: LAUNCH PREP (Week 14–16)                              β•‘
╠══════════════════════════════════════════════════════════════════╣
β•‘                                                                  β•‘
β•‘  Week 14: Deployment + Infrastructure                            β•‘
β•‘  ────────────────────────────────────                            β•‘
β•‘  β€’ Deploy to Vercel (frontend) + Railway (backend)               β•‘
β•‘  β€’ DNS, SSL, domain (gs360.study)                                β•‘
β•‘  β€’ Backup pipeline live (Cloudflare R2)                          β•‘
β•‘  β€’ Monitoring: basic health check endpoint                        β•‘
β•‘  β€’ Smoke testing on production                                    β•‘
β•‘                                                                  β•‘
β•‘  Week 15: Documentation + Community Setup                        β•‘
β•‘  ────────────────────────────────────────                         β•‘
β•‘  β€’ README.md, CONTRIBUTING.md, CONTENT_GUIDE.md                  β•‘
β•‘  β€’ DMCA.md + LICENSE (Apache-2.0) + SECURITY.md                  β•‘
β•‘  β€’ Discord server + Telegram channel setup                       β•‘
β•‘  β€’ (Optional) Plausible analytics β€” 2 hours                      β•‘
β•‘                                                                  β•‘
β•‘  Week 16: Soft Launch                                             β•‘
β•‘  ──────────────────                                              β•‘
β•‘  β€’ Invite 20–30 beta users from UPSC Telegram groups             β•‘
β•‘  β€’ Monitor 5 days: crashes, data leaks, UX confusion             β•‘
β•‘  β€’ Hotfix cycle: fix critical bugs daily                          β•‘
β•‘  β€’ Day 5: Public launch β€” GitHub, Reddit, Twitter                β•‘
β•‘  β€’ Day 7: First user feedback β†’ plan v1.1 based on data          β•‘
β•‘                                                                  β•‘
β•šβ•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•β•

🧭 Post-v1 Roadmap (Phase 5): Content Bootstrap + Current Affairs Pipeline

Tracking ticket: Issue #20

Goal

Ship a compliant, local-first content bootstrap and current-affairs ingestion pipeline so a new user can install GS360 and immediately start with a high-quality baseline notebook.

Scope

Stream Deliverable Notes
Starter notebook packs Manifest-driven auto-download Store manifests/metadata in Git, not copyrighted files
Source compliance License metadata + policy checks Prefer public-domain/open-license and user-provided sources
Local ingestion Download β†’ validate β†’ index One-command local bootstrap
Current affairs Connector framework + scheduler Prioritize RSS/API/allowed sources first
Quality controls Dedupe + tagging + source citations UPSC-friendly summaries with traceable provenance

Milestones

Milestone Outcome
A Manifest schema, downloader, checksum validation
B Local indexing and one-command starter setup
C Current-affairs connector + normalized article schema
D Summarization, tagging, scheduling, and resilience tests

Acceptance Criteria

  • Fresh local install can bootstrap starter knowledge with one command.
  • Chat answers include source metadata for ingested content.
  • Daily current-affairs sync appends into knowledge bases reliably.
  • Pipeline handles source errors with retries and safe fallback behavior.

πŸ’° Honest Cost Model

Operational Costs at Scale

Users Monthly LLM Infra Total Monthly Funding Source
1–50 β‚Ή0 β‚Ή0 β‚Ή0 Free tier
50–200 β‚Ή500–₹800 β‚Ή0 ~β‚Ή800/mo Personal / bootstrap
200–1,000 β‚Ή2K–₹4K β‚Ή500 ~β‚Ή5K/mo GitHub Sponsors + OpenCollective
1,000–5,000 β‚Ή15K–₹20K β‚Ή2K ~β‚Ή22K/mo Institutional sponsor needed
5,000+ β‚Ή50K+ β‚Ή5K+ β‚Ή55K+/mo Coaching partnership or govt grant

Funding Strategy

Threshold Trigger Action
>200 users LLM > β‚Ή1K/mo GitHub Sponsors + OpenCollective
>1,000 users LLM > β‚Ή5K/mo GitHub Education grant. Google Education credits. Coaching institute sponsorship.
>5,000 users LLM > β‚Ή20K/mo Government grant (MyGov, AICTE, NITI Aayog). Infosys Foundation / Tata Trusts. Optional premium tier (β‚Ή99/mo for heavy AI users, platform stays free).

One-Time Costs (v1 Development)

Item Cost Notes
Domain (gs360.study) ~β‚Ή800/year
Domain expert content review β‚Ή15,000–₹20,000 2 weeks, part-time
Eval infrastructure (LLM API for eval runs) ~β‚Ή3,000–₹5,000 ~10 weekly eval runs Γ— 200 queries
Golden set expert micro-review ~β‚Ή3,000 Verify answer keys are correct
Total v1 investment ~β‚Ή28,000

πŸ”§ Operational Runbook

Responsibility Who Time Automation
DMCA monitoring Lead dev ~1 hr/week dmca@gs360.study inbox
PR review (content packs) Lead dev + 1 community maintainer ~2 hrs/week (post-community) CI validates schema + copyright scan
Backup verification Automated 0 (alert on failure) GitHub Actions + Telegram alert
RAG eval monitoring Lead dev ~30 min/week Weekly automated run, review scorecard
Eval review Lead dev ~30 min/week Manual review of regressions/edge cases
LLM cost monitoring Automated 0 (alert on threshold) cost-monitor.py alerts if >120% budget
Community management Lead dev ~3 hrs/week Discord + Telegram
Security incidents Lead dev On-call SECURITY.md + weekly audit log review

"15 lakh students. Zero free AI tools. One open-source platform. The community builds it. The community owns it. The community benefits."

"But the community builds it AFTER we build something worth building on. And we build it in 16 weeks, not 10."

β€” GS360 Open Source, Final Plan