An autonomous Python agent that monitors high-value GitHub repositories, uses Google Gemini AI to classify issues by technical complexity, and delivers real-time Telegram notifications for expert-level opportunities.
AI-opensource/
├── scout.py # Main orchestrator — the brain of the system
├── ai_engine.py # Gemini AI integration via REST API
├── github_client.py # GitHub issue fetcher using PyGithub
├── storage.py # SQLite memory to avoid duplicate processing
├── notifier.py # Telegram notification sender
├── models.py # Pydantic data models (type safety)
├── requirements.txt # Python dependencies
├── run.bat # One-click execution script (Windows)
├── .env # API keys and secrets (not committed)
├── .env.example # Template for .env
└── scout.db # Auto-generated SQLite database (runtime)
Purpose: Defines the structured data types used across the entire project. Every other module imports from here.
Why it exists: Without strict types, data would be passed around as raw dictionaries, leading to typos, missing fields, and hard-to-debug errors. Pydantic enforces type safety at runtime.
from pydantic import BaseModel, Field
from typing import List, Optional| Class | Fields | Purpose |
|---|---|---|
GitHubIssue |
id, number, title, body, html_url, repo_name |
Represents a single GitHub issue fetched from the API |
IssueAnalysis |
fit_score, is_expert_level, implementation_strategy, reasoning |
Represents the AI's verdict on an issue |
fit_score: int = Field(ge=1, le=10)— Pydantic'sFieldwithge(greater/equal) andle(less/equal) constraints ensures the AI never returns a score outside 1-10.body: Optional[str] = ""— Some GitHub issues have no body. This default preventsNoneTypeerrors downstream.BaseModel— Pydantic's base class. Any class inheriting from it gets automatic JSON serialization, validation, and type coercion.
| Library | Import | Use Case |
|---|---|---|
pydantic |
BaseModel, Field |
Runtime data validation. When we do IssueAnalysis(**data), Pydantic validates every field type and constraint automatically. |
typing |
List, Optional |
Python type hints for better IDE support and code clarity. |
Purpose: Connects to the GitHub API and fetches the most recent open issues from any repository, filtering out Pull Requests.
Why it exists: GitHub's API is complex (pagination, rate limits, authentication). PyGithub abstracts all of this into simple Python objects.
import os
from github import Github
from models import GitHubIssue
from typing import List
from dotenv import load_dotenv__init__: ReadsGITHUB_TOKENfrom.envand creates an authenticated GitHub client.fetch_recent_issues(repo_full_name, limit=10):- Calls
repo.get_issues(state='open', sort='created', direction='desc')— gets open issues sorted newest first. - Skips Pull Requests: GitHub's API returns PRs mixed with issues. The
issue.pull_requestcheck filters them out. - Respects the
limit: Stops after collectinglimitreal issues (not PRs). - Returns a list of
GitHubIssuePydantic models.
- Calls
- Lazy iteration: PyGithub returns a
PaginatedList(lazy iterator). We iterate and break early — this avoids fetching all 10,000+ issues from large repos likepytorch. - No
len()on PaginatedList: Callinglen()would force-fetch ALL pages. This was a bug I fixed during development.
| Library | Import | Use Case |
|---|---|---|
PyGithub |
Github |
Full-featured GitHub API wrapper. Handles OAuth, pagination, rate limiting, and object mapping. |
python-dotenv |
load_dotenv |
Loads .env file into os.environ so we can read GITHUB_TOKEN. |
os |
os.getenv |
Reads environment variables. |
Purpose: Takes a GitHub issue and uses Google's Gemini 2.5 Flash model to determine if it's "expert-level" — scoring it 1-10 and providing an implementation strategy.
Why it exists: The core intelligence of the system. Without AI, we'd need manual rules that can't understand the nuance of technical issues.
import os
import requests
import json
from models import GitHubIssue, IssueAnalysis
from dotenv import load_dotenv
import re__init__: ReadsGEMINI_API_KEYfrom.env. Constructs the REST API URL forgemini-2.5-flash.analyze_issue(issue):- Builds a detailed prompt that instructs Gemini to act as a "Lead AI Engineer".
- Sends a POST request to
https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent. - Uses JSON mode (
response_mime_type: "application/json") to force structured output. - Parses the response and returns an
IssueAnalysisPydantic model. - Fallback: If JSON parsing fails, uses regex (
re.search) to extract JSON from freeform text. - Error handling: Returns a safe default (
fit_score=1, is_expert_level=False) with "AI Error" in reasoning.
The google-generativeai Python SDK caused issues:
- Interactive prompts: The SDK's gRPC transport triggered
(Y/N)?prompts that blocked autonomous execution. - Model naming inconsistencies: The SDK couldn't resolve
gemini-1.5-flashvsmodels/gemini-1.5-flash. - Direct REST is simpler, more predictable, and has zero dependencies beyond
requests.
The prompt uses a structured approach:
- Role definition: "You are a Lead AI Engineer"
- Positive criteria: RAG, multi-agent, PyTorch/TensorFlow, complex React state
- Negative criteria: Documentation typos, CSS fixes, basic feature requests
- Output schema: Exact JSON format with field descriptions
- Body truncation:
issue.body[:3000]prevents exceeding Gemini's context window
| Library | Import | Use Case |
|---|---|---|
requests |
requests.post |
HTTP client for making REST API calls to Gemini. Lightweight, no gRPC dependency. |
json |
json.loads |
Parsing the JSON response from Gemini into a Python dictionary. |
re |
re.search |
Regex fallback to extract JSON from Gemini responses that aren't perfectly formatted. |
python-dotenv |
load_dotenv |
Loads GEMINI_API_KEY from .env. |
Purpose: Tracks which issues have already been processed so the scout never analyzes the same issue twice across runs.
Why it exists: Without persistence, every hourly cycle would re-analyze all 140+ issues (20 per repo × 7 repos), wasting API calls and sending duplicate notifications.
import sqlite3
import os__init__: Creates/opensscout.dband ensures theprocessed_issuestable exists.is_processed(issue_id): Checks if an issue ID exists in the database.mark_processed(issue_id): Inserts the issue ID. UsesINSERT OR IGNOREto handle duplicates gracefully.
CREATE TABLE IF NOT EXISTS processed_issues (
issue_id INTEGER PRIMARY KEY
);- SQLite: Zero-configuration, serverless, file-based. Perfect for a single-user tool — no PostgreSQL/MySQL setup needed.
- Connection-per-call: Each method opens and closes its own connection. This avoids stale connections and is thread-safe.
INSERT OR IGNORE: If somehow the same issue ID is inserted twice, SQLite silently ignores it instead of crashing.- Critical flaw fix: Issues are only marked as processed when the AI analysis succeeds. If Gemini returns an error, the issue stays unprocessed and gets retried next cycle.
| Library | Import | Use Case |
|---|---|---|
sqlite3 |
Built-in | Python's built-in SQLite interface. No installation needed. Creates a scout.db file in the project root. |
Purpose: Sends beautifully formatted notifications to a Telegram chat when an expert-level issue is found.
Why it exists: The whole point of the scout is to alert you in real-time. Telegram is instant, free, and works on mobile.
import os
import requests
from models import GitHubIssue, IssueAnalysis
from dotenv import load_dotenv__init__: ReadsTELEGRAM_BOT_TOKENandTELEGRAM_CHAT_IDfrom.env.notify(issue, analysis):- Constructs an HTML-formatted message with emojis, bold text, and a clickable link to the issue.
- If Telegram credentials are missing, falls back to console printing (for testing without a bot).
- Sends via
POSTtohttps://api.telegram.org/bot{token}/sendMessagewithparse_mode: "HTML".
🚀 New Expert Issue Found!
📍 Repo: langchain-ai/langgraph
🔗 Issue: [Bug] RAG context lost in multi-agent streaming
📊 Fit Score: 9/10
💡 Strategy: Fix context serialization in agent handoff protocol
🧐 Reasoning: RAG + multi-agent coordination = core expert domains
Telegram's Markdown parser is strict — characters like _, *, [, ] in issue titles would crash the message. HTML mode (<b>, <a href>) is far more forgiving and never fails on special characters.
| Library | Import | Use Case |
|---|---|---|
requests |
requests.post |
Sends HTTP POST to the Telegram Bot API. |
python-dotenv |
load_dotenv |
Loads bot token and chat ID from .env. |
Purpose: The entry point and brain of the entire system. Ties all modules together into a continuous monitoring loop.
Why it exists: Each module is independent. The orchestrator defines the workflow: fetch → analyze → decide → notify → sleep → repeat.
import os, sys, time, warnings
from dotenv import load_dotenv
from github_client import GitHubClient
from ai_engine import AIEngine
from storage import Storage
from notifier import TelegramNotifier- Initialization: Creates instances of all four modules.
- Repository loop: Iterates over the
REPOSlist. - Issue loop: For each repo, fetches 20 recent issues.
- Processing pipeline for each issue:
- Skip if already in
scout.db - Send to Gemini for analysis
- If
is_expert_level == True→ send Telegram notification - If analysis succeeded → mark as processed
- If analysis failed (AI Error) → skip (will retry next cycle)
- Sleep 1 second between API calls (rate limiting)
- Skip if already in
- Continuous mode: When run without
--test, loops forever with 1-hour sleep between cycles.
| Repository | Domain |
|---|---|
langchain-ai/langgraph |
Multi-agent AI orchestration |
crewAIInc/crewAI |
AI agent frameworks |
pytorch/pytorch |
Deep learning framework |
google-gemini/cookbook |
Gemini AI examples |
n8n-io/n8n |
Workflow automation |
aden-hive/hive |
Collaborative platform |
calcom/cal.com |
Scheduling infrastructure |
warnings.filterwarnings("ignore"): Suppresses deprecation warnings from libraries for clean output.--testflag: Runs a single cycle onlangchain-ai/langchainonly. Used for debugging.- Graceful error handling: If one repo fails (404, network error), the scout continues to the next repo.
- 1-second delay: Prevents hitting Gemini's rate limit (especially on free tier).
| Library | Import | Use Case |
|---|---|---|
sys |
sys.argv |
Command-line argument parsing (--test flag). |
time |
time.sleep |
Rate limiting between API calls and the 1-hour cycle sleep. |
warnings |
warnings.filterwarnings |
Suppresses noisy library warnings. |
Purpose: Windows batch script that handles virtual environment activation and runs the scout in one click.
@echo off
if not exist venv (
python -m venv venv
.\venv\Scripts\python -m pip install -r requirements.txt
)
echo Y | .\venv\Scripts\python scout.py %*
pause- Creates the
venvif it doesn't exist. - Installs all dependencies from
requirements.txt. - Runs
scout.pywith any arguments passed (e.g.,--test). echo Y |pipes "Y" to stdin to bypass any interactive prompts from libraries.pausekeeps the window open so you can read the output.
PyGithub
google-generativeai
python-dotenv
pydantic
requests
tqdm
GITHUB_TOKEN=ghp_xxxx # GitHub Personal Access Token
GEMINI_API_KEY=AIzaSyXxxxx # Google AI Studio API Key
TELEGRAM_BOT_TOKEN=123456:ABCx # Telegram Bot Token from @BotFather
TELEGRAM_CHAT_ID=7069295447 # Your Telegram numeric chat ID| Technology | Version | Purpose |
|---|---|---|
| Python 3.x | 3.10+ | Primary language. Chosen for its rich ecosystem of AI/API libraries. |
| Library | Version | Used In | Purpose |
|---|---|---|---|
| PyGithub | Latest | github_client.py |
Full GitHub API v3 wrapper. Handles authentication, pagination, rate limiting, and maps API responses to Python objects. |
| requests | Latest | ai_engine.py, notifier.py |
HTTP client for REST API calls (Gemini AI, Telegram Bot API). Lightweight alternative to aiohttp/httpx. |
| pydantic | v2 | models.py |
Data validation and serialization. Ensures all data flowing through the system is correctly typed and constrained. |
| python-dotenv | Latest | All modules | Loads .env file into environment variables. Keeps secrets out of source code. |
| sqlite3 | Built-in | storage.py |
Embedded SQL database. Zero-config, serverless, file-based. Ships with Python. |
| google-generativeai | Latest | requirements.txt |
Installed but not actively used — we switched to direct REST calls for reliability. Kept in requirements for potential future use. |
| tqdm | Latest | requirements.txt |
Progress bar library. Available for future improvements (e.g., showing progress during batch analysis). |
| API | Endpoint | Auth Method | Purpose |
|---|---|---|---|
| GitHub REST API v3 | api.github.com |
Personal Access Token (Bearer) | Fetching repository issues, filtering PRs, reading issue metadata. |
| Google Gemini API | generativelanguage.googleapis.com/v1beta |
API Key (query param) | AI-powered issue classification using gemini-2.5-flash model. |
| Telegram Bot API | api.telegram.org |
Bot Token (URL path) | Sending formatted HTML notifications to a Telegram chat. |
| Component | Technology | Purpose |
|---|---|---|
| Database | SQLite (scout.db) |
Persistence layer to track processed issues across restarts. |
| Environment | Python venv | Isolated dependency management. |
| Execution | Windows Batch (run.bat) |
One-click startup with auto-setup. |
┌─────────────┐ ┌──────────────────┐ ┌─────────────┐
│ GitHub │────►│ github_client │────►│ models │
│ API │ │ (PyGithub) │ │ (Pydantic) │
└─────────────┘ └──────────────────┘ └──────┬──────┘
│
▼
┌─────────────┐ ┌──────────────────┐ ┌──────────────┐
│ SQLite │◄───│ storage │◄───│ scout │
│ (scout.db) │ │ (sqlite3) │ │ (orchestrator)│
└─────────────┘ └──────────────────┘ └──────┬──────┘
│
▼
┌─────────────┐ ┌──────────────────┐ ┌──────────────┐
│ Gemini │◄───│ ai_engine │◄───│ IssueAnalysis│
│ API │ │ (REST/HTTP) │────►│ (Pydantic) │
└─────────────┘ └──────────────────┘ └──────┬──────┘
│
▼
┌─────────────┐ ┌──────────────────┐ │
│ Telegram │◄───│ notifier │◄───────────┘
│ Bot API │ │ (requests) │ (if expert-level)
└─────────────┘ └──────────────────┘
# Test mode (single repo, one cycle)
.\run.bat --test
# Production mode (all repos, continuous loop every 1 hour)
.\run.bat