Automated reproducibility analysis for machine learning research repositories (supporting Python, R, Julia, and Jupyter Notebooks) using AST-based static analysis and LLM-powered semantic auditing.
Live: repo-audit.vercel.app · API: repoaudit-api.onrender.com
Watch the Demo video
- What It Does
- What's New in v2.0
- Tech Stack
- Project Structure
- Getting Started
- Deployment
- API Endpoints
- Features
- GitHub Action
- Documentation
RepoAudit scans public GitHub ML repositories and produces a reproducibility score (0–100) across six categories:
| Category | Weight | Checks |
|---|---|---|
| Environment | 15% | Pinned dependencies, Dockerfile, Reproducibility Decay Tracking (Yanked pkgs, CVEs, shelf-life) |
| Determinism | 20% | AST-verified seeding, Non-deterministic shuffling detection, Notebook out-of-order execution, cell mutation |
| Datasets | 15% | No hardcoded paths, Data Provenance (URL liveness, gated datasets) |
| Semantic | 20% | AI-verified alignment between README and repo structure |
| Execution | 20% | L0–L3 Replay Verification via Bubblewrap, Pipeline Graph Reconstruction (PGR), Presence of standard entry points (train.py, Makefile, etc.) |
| Documentation | 10% | README sections for Installation, Usage, Datasets |
| Layer | Technology |
|---|---|
| Frontend | Next.js 15, Tailwind CSS, Recharts, Lucide React |
| Backend API | FastAPI (Python 3.11+) |
| Task Queue | Celery + Valkey (local) / Upstash Redis (production) |
| Analysis | Python ast module, libcst (for deterministic rewriting), Tree-sitter (R, Julia), Jupyter parsing, cross-file import graph |
| AI Layer | Hugging Face API (Llama-3.3-70B) |
| Remediation | Native Python libcst & difflib |
| Database | PostgreSQL via Supabase |
| Cache | Valkey protocol cache (local Valkey, Upstash Redis in production) |
| Deployment | Render (backend), Vercel (frontend) |
RepoAudit/
├── backend/
│ ├── Dockerfile
│ ├── requirements.txt
│ ├── .env.example
│ ├── pyproject.toml
│ ├── main.py # FastAPI entry point
│ ├── config.py # Pydantic settings
│ ├── db.py # Supabase client
│ ├── models.py # Pydantic schemas
│ ├── worker.py # Celery config (Upstash TLS)
│ ├── tasks.py # Async audit task
│ ├── routers/
│ │ └── audit.py # /api/v1/audit endpoints
│ ├── engine/
│ │ ├── cloner.py # Git clone + cleanup
│ │ ├── setup_parsers.py # AOT Tree-sitter parser builder
│ │ ├── parsers.py # Multi-language AST loaders
│ │ ├── ast_auditor.py # Determinism checks (Python, R, Julia, .ipynb)
│ │ ├── notebook_analyzer.py # Deep analysis for Jupyter Notebooks
│ │ ├── path_auditor.py # Hardcoded path detection
│ │ ├── dependency_auditor.py # Dependency analysis (Python, R, Julia)
│ │ ├── semantic_auditor.py # LLM README audit
│ │ ├── import_graph.py # Cross-file import graph, cycle detection, flow tracing
│ │ ├── data_provenance_auditor.py # Data loading, URL liveness, gated datasets
│ │ ├── hardware_fingerprinting_auditor.py # Anti-sandbox / Hardware identification
│ │ ├── configuration_drift_auditor.py # Hyperparameter discrepancy detection
│ │ ├── sandbox.py # Bubblewrap orchestration
│ │ ├── replay_auditor.py # Dynamic L0–L3 execution verification
│ │ ├── decay_auditor.py # Reproducibility decay (bit rot) tracking
│ │ ├── pipeline_auditor.py # Pipeline Graph Reconstruction (PGR)
│ │ ├── auto_remediator.py # AST-powered deterministic code-mod engine
│ │ └── scoring.py # Weighted score computation
│ └── tests/
│ ├── test_ast_auditor.py
│ ├── test_path_auditor.py
│ ├── test_dependency_auditor.py
│ ├── test_import_graph.py
│ ├── test_replay_auditor.py
│ ├── test_auto_remediator.py
│ └── test_scoring.py
├── frontend/
│ ├── Dockerfile
│ ├── package.json
│ ├── .env.local.example
│ ├── app/
│ │ ├── layout.tsx
│ │ ├── page.tsx # Main audit page
│ │ └── audit/[id]/page.tsx # Result permalink
│ ├── components/
│ │ ├── AuditForm.tsx
│ │ ├── ScoreCard.tsx # Circular gauge
│ │ ├── RadarChart.tsx # 6-axis category chart
│ │ ├── ScoreHistory.tsx # Score trend line chart
│ │ ├── FixFeed.tsx # Prioritized issue list
│ │ ├── DecayCard.tsx # Decay curve & shelf-life visualization
│ │ ├── PipelineGraph.tsx # Interactive DAG visualization (ReactFlow)
│ │ └── StatusIndicator.tsx # Progress stepper
│ └── lib/
│ └── api.ts # Typed API client
├── render.yaml # Render Blueprint
├── action.yml # GitHub Action metadata
├── action/
│ └── audit.py # GitHub Action script (stdlib only)
├── .github/
│ └── workflows/
│ └── repoaudit.yml # CI workflow for this repo
├── docker-compose.yml # Local development
└── LICENSE
- Python 3.11+
- Node.js 20+
- Valkey (local dev) or Upstash account (free tier, production)
- Supabase account (free tier)
- Hugging Face API key (free tier)
Run this in your Supabase SQL editor:
CREATE TABLE IF NOT EXISTS repositories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
url TEXT UNIQUE NOT NULL,
owner TEXT,
name TEXT,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE TABLE IF NOT EXISTS audits (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
repo_id UUID REFERENCES repositories(id) ON DELETE SET NULL,
commit_hash TEXT NOT NULL,
score INTEGER CHECK (score >= 0 AND score <= 100),
report_json JSONB NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);
CREATE INDEX IF NOT EXISTS idx_audit_commit ON audits(commit_hash);
CREATE INDEX IF NOT EXISTS idx_audit_repo ON audits(repo_id);# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your Supabase + Hugging Face keys
# 2. Start everything
docker compose up --buildFrontend at http://localhost:3000, API at http://localhost:7860.
# Backend
cd backend
cp .env.example .env
# Edit .env with your keys
pip install -r requirements.txt
valkey-server &
celery -A worker worker --loglevel=info &
uvicorn main:app --reload --port 7860
# Frontend (separate terminal)
cd frontend
cp .env.local.example .env.local
npm install
npm run devcd backend
pytestRepoAudit runs on an entirely free stack:
| Service | Platform | Cost |
|---|---|---|
| Backend (API + Celery) | Render | $0 |
| Frontend | Vercel | $0 |
| Redis (cache + broker) | Upstash | $0 |
| Database | Supabase | $0 |
| LLM | Hugging Face | $0 |
-
Upstash — Create a Redis database at console.upstash.com. Copy the
rediss://URL. -
Render — Go to render.com → New → Blueprint → connect this repo. Render auto-detects render.yaml and prompts for env vars:
Key Value SUPABASE_URLYour Supabase project URL SUPABASE_KEYYour Supabase anon key HF_API_KEYYour Hugging Face API key REDIS_URLrediss://default:...@....upstash.io:6379CELERY_BROKER_URLSame Upstash URL CELERY_RESULT_BACKENDSame Upstash URL ALLOWED_ORIGINShttps://your-app.vercel.app,http://localhost:3000 -
Vercel — Import this repo → set Root Directory to frontend → add env var:
Key Value NEXT_PUBLIC_API_URLhttps://your-app.onrender.com -
Update ALLOWED_ORIGINS on Render to include your Vercel URL.
- Vercel: If the project is connected to GitHub, Vercel automatically deploys on every push to
mainby default. - Render: Make sure Auto-Deploy is enabled for your Render service. If your Render service isn’t auto-deploying reliably, add the GitHub Actions fallback below.
GitHub Actions fallback (recommended for Render):
- In Render, open your service → Settings → Deploy Hook → copy the hook URL.
- In GitHub, add a repo secret named
RENDER_DEPLOY_HOOK_URLwith that value. - Pushing to
mainwill now trigger Render via.github/workflows/deploy-render.yml.
| Method | Path | Description |
|---|---|---|
POST |
/api/v1/audit |
Submit a repo URL for analysis |
GET |
/api/v1/audit/{id} |
Get full audit result |
GET |
/api/v1/audit/{id}/status |
Poll task progress |
GET |
/api/v1/audit/history/{owner}/{repo} |
Score history across audits |
POST |
/api/v1/compare |
Compare up to 5 repo URLs |
GET |
/health |
Health check |
Example:
curl -X POST https://repoaudit-api.onrender.com/api/v1/audit \
-H "Content-Type: application/json" \
-d '{"url": "https://github.com/owner/repo"}'Note: You can also submit research paper URLs (e.g., from arXiv, Papers With Code, NeurIPS) and RepoAudit will automatically resolve them to their corresponding GitHub repository.
- On submission, the API resolves the repo's latest
commit_hashviagit ls-remote - If that hash exists in Upstash Redis (L1) or Supabase Postgres (L2), the cached report is returned instantly
- Otherwise, a Celery task clones the repo (depth=1) and runs the full analysis pipeline
- Deterministic Auto-Remediation Engine: Powered by offline AST manipulation, it instantly fixes high-confidence reproducibility blockers by injecting missing seeds, dynamically pinning unpinned dependencies, and rewriting hardcoded paths, outputting a concrete
.patchfile natively without LLMs. - Data Provenance Auditing: Detects how data is loaded, checks URL liveness, flags gated datasets, and identifies non-deterministic preprocessing.
- Configuration Drift Detection: Catches discrepancies between claimed hyperparameters in README and actual values in config files or code defaults.
- Notebook-Specific Deep Analysis: Goes beyond basic extraction to detect out-of-order cell execution (variable used before definition in earlier cells), identifies global state mutations (top-level assignments/imports), verifies "Restart and Run All" compatibility, and flags non-reproducible runtime dependency installations (e.g.,
!pip install). - Execution Replay Verification (Lightweight): Goes beyond static analysis by performing a 4-tier reproduction check in a Bubblewrap sandbox (L0: Deps Install, L1: Import Success, L2: Entry point runs for >5s, L3: Output Production), providing a pass/fail signal for the actual reproducibility of the claimed workflow.
- Reproducibility Decay Tracking: Quantifies "bit rot" by performing temporal analysis on pinned dependencies. It integrates with PyPI/CRAN/Pkg registries to detect yanked distributions and known CVEs, generating a predicted "shelf-life" score and decay curve visualization for long-term auditability.
- Reproducibility Scoring: A weighted 0-100 score based on environment, determinism, datasets, and semantic alignment.
- Pipeline Graph Reconstruction (PGR): Infer end-to-end ML workflows by statically analyzing source code and import relationships. Identifies stages (Dataset, Preprocessing, Training, Eval, Artifact), traces variable propagation across files, and calculates a pipeline completeness score. Visualizes the resulting DAG interactively using ReactFlow.
- Multi-Repository Comparative Analysis: Compare reproducibility across related repositories (e.g., competing implementations of the same paper). Features an overlaid radar chart for 6-axis category breakdown, identifies the "Golden Standard" implementation with a 0-100 benchmark, and provides a unified comparison dashboard for research reviewers and reproducibility chairs.
RepoAudit can be integrated into your CI/CD pipeline to automatically audit PRs.
- uses: sadhumitha-s/RepoAudit@v2.0.0
with:
api-url: https://repoaudit-api.onrender.com
threshold: "70"- System Architecture - Detailed architecture and component interaction.
- Scoring Methodology - Deep dive into how reproducibility scores are calculated.
- API Reference - Comprehensive guide for all API endpoints.
- Development Guide - Setup for local development and contributing.
- GitHub Action Usage - Advanced configuration for the CI action.
- Comparison Guide - Comparative analysis of multiple repositories.
- Contribution Guide - Guidelines for contributing to the project.