Skip to content

jorgeaguirre-dev/study-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

8 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PythonΒ >=3.11 License: MIT

Study Agent - Complete Documentation

πŸ“‹ Project Overview

The Study Agent is an automated system that analyzes exam question images (from the any certification) and generates detailed explanations using Google's Gemini AI model. The system processes images in batch, extracts questions, and produces markdown-formatted study materials.

Key Features:

  • 🎯 Automated batch processing of exam question images
  • πŸ€– Intelligent analysis using Gemini 2.5 Flash model
  • πŸ“ Markdown-formatted output with explanations
  • ☁️ Cloud-native architecture on Google Cloud Platform
  • πŸ”„ Duplicate prevention (skips already processed images)

πŸ—οΈ Architecture

System Components

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     Google Cloud Platform                   β”‚
β”‚                                                             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”                  β”‚
β”‚  β”‚  GCS INPUT   β”‚         β”‚ GCS OUTPUT   β”‚                  β”‚
β”‚  β”‚   BUCKET     β”‚         β”‚   BUCKET     β”‚                  β”‚
β”‚  β”‚ (Images)     β”‚         β”‚ (Markdown)   β”‚                  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β–²β”€β”€β”€β”€β”€β”€β”€β”˜                  β”‚
β”‚         β”‚                        β”‚                          β”‚
β”‚         β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚                          β”‚
β”‚         └──>β”‚  Cloud Run Job   β”‚β”€β”˜                          β”‚
β”‚             β”‚  (Docker Image)  β”‚                            β”‚
β”‚             β”‚                  β”‚                            β”‚
β”‚             β”‚  Process Images  β”‚                            β”‚
β”‚             β”‚  + Gemini API    β”‚                            β”‚
β”‚             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                            β”‚
β”‚                     β”‚                                       β”‚
β”‚                     β–Ό                                       β”‚
β”‚             Vertex AI / Gemini                              β”‚
β”‚             (LLM Analysis)                                  β”‚
β”‚                                                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow

  1. Input Phase: Exam question images are uploaded to the INPUT_BUCKET
  2. Processing Phase: Cloud Run Job starts the containerized application
  3. Analysis Phase: For each image:
    • Load Gemini model with system prompt (tutor instructions)
    • Send image + prompt to Gemini 2.5 Flash model
    • Model returns markdown-formatted analysis
  4. Output Phase: Results saved as result_*.md files to OUTPUT_BUCKET

πŸš€ Quick Start

Prerequisites

  • Google Cloud Project with:

    • Cloud Run API enabled
    • Artifact Registry API enabled
    • Vertex AI API enabled
    • Two GCS buckets (input and output)
    • Service Account with appropriate permissions
  • Local tools:

    • gcloud CLI installed and configured
    • Docker installed (for building images locally)
    • Python 3.11+ (for local testing)

Environment Setup

  1. Create .env file in the project root:
# GCP Configuration
PROJECT_ID="your-gcp-project-id"
REGION="us-central1"
REPOSITORY_NAME="study-agent"
IMAGE_NAME="process-images"
JOB_NAME="study-agent-job"
SERVICE_ACCOUNT_EMAIL="your-sa@your-project.iam.gserviceaccount.com"

# GCS Buckets
INPUT_BUCKET="input-exam-images"
OUTPUT_BUCKET="output-study-materials"

# Model Variables
MODEL_NAME="gemini-1.5-pro"
  1. Load environment variables:
export REPO_FOLDER=${PWD}
set -o allexport && source .env && set +o allexport

🐳 Docker & Deployment

Docker Image Build & Push

The Dockerfile creates a lightweight containerized application using Python 3.11-slim:

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
COPY main.py .
RUN pip install --no-cache-dir -r requirements.txt
ENTRYPOINT ["python", "main.py"]

Build and push to Artifact Registry:

gcloud builds submit --tag ${REGION}-docker.pkg.dev/${PROJECT_ID}/${REPOSITORY_NAME}/${IMAGE_NAME}

Cloud Run Job Deployment

First-time deployment (create new job):

gcloud run jobs create ${JOB_NAME} \
    --image ${REGION}-docker.pkg.dev/${PROJECT_ID}/${REPOSITORY_NAME}/${IMAGE_NAME} \
    --region ${REGION} \
    --service-account ${SERVICE_ACCOUNT_EMAIL} \
    --set-env-vars \
      GCP_PROJECT=${PROJECT_ID},\
      GCP_REGION=${REGION},\
      INPUT_BUCKET_NAME=${INPUT_BUCKET},\
      OUTPUT_BUCKET_NAME=${OUTPUT_BUCKET} \
    --max-retries 0

Update environment variables (if job already exists):

gcloud run jobs update ${JOB_NAME} \
    --region ${REGION} \
    --service-account ${SERVICE_ACCOUNT_EMAIL} \
    --update-env-vars \
      INPUT_BUCKET_NAME=${INPUT_BUCKET},\
      OUTPUT_BUCKET_NAME=${OUTPUT_BUCKET}

Execute the job:

gcloud run jobs execute ${JOB_NAME} --region ${REGION} --task-timeout 1200s

πŸ“ Code Structure

main.py

Purpose: Main processing script that orchestrates image analysis

Key Functions:

load_prompt()

  • Loads the system instruction from system_prompt.txt
  • Falls back to system_prompt.txt.example if original doesn't exist
  • Returns the tutor agent instructions as a string

process_images_from_gcs_batch()

  • Client Initialization: Creates GCS and Gemini API clients
  • Configuration: Sets up model parameters:
    • Model: gemini-2.5-flash
    • Temperature: 0.2 (low randomness for consistent answers)
    • Max tokens: 2048
    • Safety settings: Set to BLOCK_NONE for educational content
  • Image Processing Loop:
    • Lists all objects in INPUT_BUCKET
    • Filters for image files (.png, .jpg, .jpeg, .webp)
    • Checks if output already exists (prevents reprocessing)
    • Sends image + system prompt to Gemini API
    • Saves markdown results to OUTPUT_BUCKET

system_prompt.txt

Contains the tutor agent instructions, defining:

  • Response format (Markdown)
  • Analysis structure (Transcription, Correct Answer, Detailed Explanation)
  • References to documentation
  • Output labeling (REFERENCIA_PDF tag)

requirements.txt

Dependencies:

  • google-cloud-storage: For GCS bucket operations
  • google-genai: Gemini API client (includes Vertex AI support)

Install Dependencies Locally

pip install -r requirements.txt

πŸ” Environment Variables

Variable Description Example
GCP_PROJECT Google Cloud Project ID my-project-123
GCP_REGION GCP region for deployment us-central1
INPUT_BUCKET_NAME GCS bucket for input images gs://exam-questions
OUTPUT_BUCKET_NAME GCS bucket for markdown output gs://study-materials

πŸ“Š Processing Workflow

Single Image Processing Flow

1. List blobs in INPUT_BUCKET
   ↓
2. For each blob:
   β”œβ”€ Is it an image file? (png, jpg, jpeg, webp)
   β”‚  └─ If No: Skip and continue
   β”‚
   β”œβ”€ Does output file exist?
   β”‚  └─ If Yes: Skip and continue
   β”‚
   β”œβ”€ Load image from GCS URI
   β”‚  └─ gs://input-bucket/image-name.jpg
   β”‚
   β”œβ”€ Create Gemini request with:
   β”‚  β”œβ”€ System prompt (tutor instructions)
   β”‚  β”œβ”€ Image content
   β”‚  └─ Analysis request text
   β”‚
   β”œβ”€ Call Gemini model
   β”‚  └─ Returns markdown analysis
   β”‚
   └─ Upload result to OUTPUT_BUCKET
      └─ result_sanitized-name.md

Output File Naming

Input: question_001.jpg Output: result_question_001.md

The system sanitizes filenames by:

  • Converting dots to underscores
  • Converting slashes to underscores
  • Removing extensions
  • Prefixing with result_


πŸ“š Output Format

Each generated markdown file follows this structure:

# [Question Number/Title]

## 1. Transcription
[Original question text and options]

## 2. Correct Answer
**[Correct option letter and text]**

## 3. Detailed Explanation (Tutor)
[Comprehensive explanation with reference to different docs]

REFERENCIA_PDF: [Topic like x1,x2, x3]

πŸ”„ Monitoring & Troubleshooting

Check Job Status

gcloud run jobs describe ${JOB_NAME} --region ${REGION}

View Job Execution Logs

gcloud run jobs logs read ${JOB_NAME} --region ${REGION} --limit 50

Common Issues

Issue Solution
INPUT_BUCKET not found Verify bucket name and service account permissions
No images processed Check image format and bucket contents
Output files not created Verify OUTPUT_BUCKET exists and is writable
Gemini API errors Ensure Vertex AI API is enabled and quota available

πŸ“Ž References


πŸ“Œ Notes

The google-genai library automatically includes Vertex AI dependencies. It's designed specifically for interacting with the Gemini API within Google Cloud environment.

Cloud Run Jobs are ideal for batch processing tasks. Unlike Cloud Run services, jobs automatically terminate after completion, reducing unnecessary costs.

The 1200-second timeout (20 minutes) should be sufficient for processing 50-100 images depending on model response times. Adjust as needed based on volume.


πŸ“„ License

This project is licensed under the MIT License.

πŸ’‘ For commercial inquiries or specific licensing questions, feel free to contact me.

πŸ‘€ Author

Jorge Aguirre


Last Updated: February 5, 2026

About

πŸ€–πŸ§  AI Study Agent | Screenshot β†’ Study Guide Transform screenshots and questions into structured study documents instantly.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors