Skip to content

An autonomous AI agent that solves data-driven quizzes using a state machine workflow. Made as a project for the (TDS) Tools in Data Science course IITM.

License

Notifications You must be signed in to change notification settings

mynkpdr/yantrasolve

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

22 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧩 YantraSolve

AI-Powered Autonomous Quiz Solver

Python 3.12+ FastAPI LangGraph Hugging Face Space License: MIT

Features β€’ Quick Start β€’ API β€’ Architecture β€’ Configuration β€’ Testing

πŸ“– Overview

YantraSolve is an autonomous AI agent that solves data-driven quizzes using a state machine workflow. Built for the Tools in Data Science – Project 2 (IITM BS Degree Programme).

πŸ”„ Workflow

The application uses a LangGraph state machine to orchestrate the solving process:

  1. Fetch Context: The agent visits the quiz URL using a headless browser (Playwright) to capture HTML, text, console logs, and a screenshot.
  2. Agent Reasoning: An LLM (GPT-4o or similar) analyzes the page context and decides the next step.
  3. Tool Execution: If the agent needs to calculate something, download a file, or analyze an image, it calls the appropriate tool.
  4. Submission: Once the answer is determined, the agent submits it to the server.
  5. Feedback Loop: The system checks the submission result.
    • Correct: The agent proceeds to the next quiz URL.
    • Incorrect: The agent retries with the error feedback (up to 10 attempts).
    • Timeout: If the quiz takes too long, it skips to the next one.
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ fetch_context│────▢│ agent_reasoning │◀───▢│ execute_tools β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚  submit_answer  β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚process_feedback │────▢│ next quiz/ENDβ”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

✨ Features

Agent Tools

Tool Description
python_tool Execute Python with persistent session (pandas, numpy pre-loaded)
javascript_tool Run JavaScript on browser pages via Playwright
download_file_tool Download files (≀5MB) with caching
call_llm_tool Analyze files with Gemini 2.5 Flash Lite (images, PDFs, audio, video)
call_llm_with_multiple_files_tool Multi-file analysis
submit_answer_tool Submit answers to quiz endpoints

Capabilities

Category What it can do
Web JS-rendered pages, dynamic content, console logs, iframes
Files PDF extraction, Excel/CSV, ZIP/Gzip decoding
Vision OCR, QR codes, chart reading, screenshots
Audio Transcription via Gemini
Data Pandas operations, filtering, aggregation, statistics
ML Regression, clustering, classification
Geo GeoJSON/KML with networkx

Reliability

  • ⏱️ 3-minute timeout per quiz with auto-skip
  • πŸ”„ 10 retry attempts before moving on
  • πŸ”‘ Round-robin API key rotation for Gemini
  • πŸ’Ύ File-based caching with TTL
  • πŸ›‘οΈ Graceful error handling - agent never crashes

πŸš€ Quick Start

Prerequisites

  • Python 3.12+
  • uv (recommended) or pip
  • Docker (optional, for containerized run)

Installation (Local)

# Clone repository
git clone https://github.com/mynkpdr/yantrasolve.git
cd yantrasolve

# Install dependencies
uv sync  # or: pip install -e .

# Install browser
playwright install chromium --with-deps

Installation (Docker)

# Build image
docker build -t yantrasolve .

# Run container
docker run --env-file .env -p 8000:8000 yantrasolve

Run

# Development
uv run python main.py

# Production
uv run uvicorn main:app --host 0.0.0.0 --port 8000

πŸ“‘ API Reference

Health Check

GET /
GET /health

Response: 200 OK

{"status": "ok", "message": "Quiz Solver is running"}

Submit Quiz

POST /quiz
Content-Type: application/json

Request:

{
  "email": "student@example.com",
  "secret": "your-secret-key",
  "url": "https://example.com/quiz/1"
}

Response:

Status Description
200 Quiz solving started (background)
400 Invalid JSON payload
403 Invalid secret or email

🏭 Architecture

yantrasolve/
β”œβ”€β”€ main.py                 # FastAPI application
β”œβ”€β”€ app/
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   └── settings.py     # Pydantic settings
β”‚   β”œβ”€β”€ graph/
β”‚   β”‚   β”œβ”€β”€ graph.py        # LangGraph workflow
β”‚   β”‚   β”œβ”€β”€ state.py        # QuizState TypedDict
β”‚   β”‚   └── resources.py    # Global resources
β”‚   β”œβ”€β”€ nodes/
β”‚   β”‚   β”œβ”€β”€ fetch.py        # Page fetching
β”‚   β”‚   β”œβ”€β”€ agent.py        # AI reasoning
β”‚   β”‚   β”œβ”€β”€ tools.py        # Tool execution
β”‚   β”‚   β”œβ”€β”€ submit.py       # Answer submission
β”‚   β”‚   └── feedback.py     # Response handling
β”‚   β”œβ”€β”€ tools/
β”‚   β”‚   β”œβ”€β”€ python.py       # Python sandbox
β”‚   β”‚   β”œβ”€β”€ javascript.py   # Browser JS
β”‚   β”‚   β”œβ”€β”€ download.py     # File downloader
β”‚   β”‚   β”œβ”€β”€ call_llm.py     # Gemini multimodal
β”‚   β”‚   └── submit_answer.py
β”‚   β”œβ”€β”€ resources/
β”‚   β”‚   β”œβ”€β”€ llm.py          # Multi-provider LLM
β”‚   β”‚   β”œβ”€β”€ browser.py      # Playwright wrapper
β”‚   β”‚   └── api.py          # HTTP client
β”‚   └── utils/
|       β”œβ”€β”€ answers.py      # Save correct answers
β”‚       β”œβ”€β”€ cache.py        # File-based caching
β”‚       β”œβ”€β”€ gemini.py       # Gemini utilities
β”‚       β”œβ”€β”€ helpers.py      # Temp file management
β”‚       └── logging.py      # Loguru setup
β”œβ”€β”€ tests/                  # Pytest suite
β”œβ”€β”€ Dockerfile
└── pyproject.toml

🧰 Configuration

Variable Default Description
SECRET_KEY required Authentication secret
STUDENT_EMAIL required Student email
LLM_API_KEY required Primary LLM API key
LLM_PROVIDER openai openai or google
LLM_MODEL gpt-4.1 Reasoning model
LLM_TEMPERATURE 0.1 Sampling temperature
GEMINI_API_KEYS β€” Comma-separated Gemini keys
GEMINI_BASE_URL https://aipipe.org/openrouter/v1 Gemini API endpoint (OpenRouter-compatible)
GEMINI_MODEL google/gemini-2.5-flash-lite Gemini model for file analysis
TEMP_DIR /tmp/quiz_files Temp file storage
CACHE_DIR /tmp/quiz_cache Cache storage
BROWSER_PAGE_TIMEOUT 10000 Playwright timeout (ms)
QUIZ_TIMEOUT_SECONDS 180 Per-quiz timeout

🐳 Docker

# Build
docker build -t yantrasolve .

# Run
docker run -p 8000:8000 \
  -e SECRET_KEY=xxx \
  -e STUDENT_EMAIL=xxx \
  -e LLM_API_KEY=xxx \
  -e GEMINI_API_KEYS=xxx \
  yantrasolve

Hugging Face Spaces

  1. Create a new Space with Docker SDK
  2. Push this repository
  3. Add secrets in Space settings
  4. Access via https://your-space.hf.space/quiz

πŸ§ͺ Testing

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=app

# Run specific module
uv run pytest tests/test_tools/ -v

Test coverage: 225 tests covering all modules.


πŸ—ΊοΈ Roadmap

  • Dynamic model selection per quiz type
  • Parallel quiz processing
  • Web UI for monitoring progress
  • Performance metrics dashboard
  • Enhanced geo-spatial analysis

πŸ“œ License

This project is licensed under the MIT License - see the LICENSE file.


πŸ‘€ Author

Mayank Kumar Poddar


Built with β˜• and determination

About

An autonomous AI agent that solves data-driven quizzes using a state machine workflow. Made as a project for the (TDS) Tools in Data Science course IITM.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages