Skip to content

sadhumitha-s/RepoAudit

Repository files navigation

RepoAudit

Automated reproducibility analysis for machine learning research repositories (supporting Python, R, Julia, and Jupyter Notebooks) using AST-based static analysis and LLM-powered semantic auditing.

CI License Python Next.js FastAPI GitHub stars

Live: repo-audit.vercel.app · API: repoaudit-api.onrender.com

Watch the Demo video

Table of Contents

What It Does

RepoAudit scans public GitHub ML repositories and produces a reproducibility score (0–100) across six categories:

Category Weight Checks
Environment 15% Pinned dependencies, Dockerfile, Reproducibility Decay Tracking (Yanked pkgs, CVEs, shelf-life)
Determinism 20% AST-verified seeding, Non-deterministic shuffling detection, Notebook out-of-order execution, cell mutation
Datasets 15% No hardcoded paths, Data Provenance (URL liveness, gated datasets)
Semantic 20% AI-verified alignment between README and repo structure
Execution 20% L0–L3 Replay Verification via Bubblewrap, Pipeline Graph Reconstruction (PGR), Presence of standard entry points (train.py, Makefile, etc.)
Documentation 10% README sections for Installation, Usage, Datasets

Tech Stack

Layer Technology
Frontend Next.js 15, Tailwind CSS, Recharts, Lucide React
Backend API FastAPI (Python 3.11+)
Task Queue Celery + Valkey (local) / Upstash Redis (production)
Analysis Python ast module, libcst (for deterministic rewriting), Tree-sitter (R, Julia), Jupyter parsing, cross-file import graph
AI Layer Hugging Face API (Llama-3.3-70B)
Remediation Native Python libcst & difflib
Database PostgreSQL via Supabase
Cache Valkey protocol cache (local Valkey, Upstash Redis in production)
Deployment Render (backend), Vercel (frontend)

Project Structure

RepoAudit/
├── backend/
│   ├── Dockerfile
│   ├── requirements.txt
│   ├── .env.example
│   ├── pyproject.toml
│   ├── main.py              # FastAPI entry point
│   ├── config.py             # Pydantic settings
│   ├── db.py                 # Supabase client
│   ├── models.py             # Pydantic schemas
│   ├── worker.py             # Celery config (Upstash TLS)
│   ├── tasks.py              # Async audit task
│   ├── routers/
│   │   └── audit.py          # /api/v1/audit endpoints
│   ├── engine/
│   │   ├── cloner.py         # Git clone + cleanup
│   │   ├── setup_parsers.py  # AOT Tree-sitter parser builder
│   │   ├── parsers.py        # Multi-language AST loaders
│   │   ├── ast_auditor.py    # Determinism checks (Python, R, Julia, .ipynb)
│   │   ├── notebook_analyzer.py   # Deep analysis for Jupyter Notebooks
│   │   ├── path_auditor.py   # Hardcoded path detection
│   │   ├── dependency_auditor.py              # Dependency analysis (Python, R, Julia)
│   │   ├── semantic_auditor.py                # LLM README audit
│   │   ├── import_graph.py                    # Cross-file import graph, cycle detection, flow tracing
│   │   ├── data_provenance_auditor.py         # Data loading, URL liveness, gated datasets
│   │   ├── hardware_fingerprinting_auditor.py # Anti-sandbox / Hardware identification
│   │   ├── configuration_drift_auditor.py     # Hyperparameter discrepancy detection
│   │   ├── sandbox.py                         # Bubblewrap orchestration
│   │   ├── replay_auditor.py                  # Dynamic L0–L3 execution verification
│   │   ├── decay_auditor.py                   # Reproducibility decay (bit rot) tracking
│   │   ├── pipeline_auditor.py                # Pipeline Graph Reconstruction (PGR)
│   │   ├── auto_remediator.py                 # AST-powered deterministic code-mod engine
│   │   └── scoring.py                         # Weighted score computation
│   └── tests/
│       ├── test_ast_auditor.py
│       ├── test_path_auditor.py
│       ├── test_dependency_auditor.py
│       ├── test_import_graph.py
│       ├── test_replay_auditor.py
│       ├── test_auto_remediator.py 
│       └── test_scoring.py
├── frontend/
│   ├── Dockerfile
│   ├── package.json
│   ├── .env.local.example
│   ├── app/
│   │   ├── layout.tsx
│   │   ├── page.tsx          # Main audit page
│   │   └── audit/[id]/page.tsx   # Result permalink
│   ├── components/
│   │   ├── AuditForm.tsx
│   │   ├── ScoreCard.tsx     # Circular gauge
│   │   ├── RadarChart.tsx    # 6-axis category chart
│   │   ├── ScoreHistory.tsx  # Score trend line chart
│   │   ├── FixFeed.tsx       # Prioritized issue list
│   │   ├── DecayCard.tsx     # Decay curve & shelf-life visualization
│   │   ├── PipelineGraph.tsx # Interactive DAG visualization (ReactFlow)
│   │   └── StatusIndicator.tsx   # Progress stepper
│   └── lib/
│       └── api.ts            # Typed API client
├── render.yaml               # Render Blueprint
├── action.yml                # GitHub Action metadata
├── action/
│   └── audit.py              # GitHub Action script (stdlib only)
├── .github/
│   └── workflows/
│       └── repoaudit.yml     # CI workflow for this repo
├── docker-compose.yml        # Local development
└── LICENSE

Getting Started

Prerequisites

  • Python 3.11+
  • Node.js 20+
  • Valkey (local dev) or Upstash account (free tier, production)
  • Supabase account (free tier)
  • Hugging Face API key (free tier)

Database Setup

Run this in your Supabase SQL editor:

CREATE TABLE IF NOT EXISTS repositories (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    url TEXT UNIQUE NOT NULL,
    owner TEXT,
    name TEXT,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE TABLE IF NOT EXISTS audits (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    repo_id UUID REFERENCES repositories(id) ON DELETE SET NULL,
    commit_hash TEXT NOT NULL,
    score INTEGER CHECK (score >= 0 AND score <= 100),
    report_json JSONB NOT NULL,
    created_at TIMESTAMP WITH TIME ZONE DEFAULT NOW()
);

CREATE INDEX IF NOT EXISTS idx_audit_commit ON audits(commit_hash);
CREATE INDEX IF NOT EXISTS idx_audit_repo ON audits(repo_id);

Option A: Docker (local development)

# 1. Configure environment
cp backend/.env.example backend/.env
# Edit backend/.env with your Supabase + Hugging Face keys

# 2. Start everything
docker compose up --build

Frontend at http://localhost:3000, API at http://localhost:7860.

Option B: Local without Docker

# Backend
cd backend
cp .env.example .env
# Edit .env with your keys
pip install -r requirements.txt
valkey-server &
celery -A worker worker --loglevel=info &
uvicorn main:app --reload --port 7860

# Frontend (separate terminal)
cd frontend
cp .env.local.example .env.local
npm install
npm run dev

Running Tests

cd backend
pytest

Deployment

RepoAudit runs on an entirely free stack:

Service Platform Cost
Backend (API + Celery) Render $0
Frontend Vercel $0
Redis (cache + broker) Upstash $0
Database Supabase $0
LLM Hugging Face $0

Deploy with Render Blueprint

  1. Upstash — Create a Redis database at console.upstash.com. Copy the rediss:// URL.

  2. Render — Go to render.comNewBlueprint → connect this repo. Render auto-detects render.yaml and prompts for env vars:

    Key Value
    SUPABASE_URL Your Supabase project URL
    SUPABASE_KEY Your Supabase anon key
    HF_API_KEY Your Hugging Face API key
    REDIS_URL rediss://default:...@....upstash.io:6379
    CELERY_BROKER_URL Same Upstash URL
    CELERY_RESULT_BACKEND Same Upstash URL
    ALLOWED_ORIGINS https://your-app.vercel.app,http://localhost:3000
  3. Vercel — Import this repo → set Root Directory to frontend → add env var:

    Key Value
    NEXT_PUBLIC_API_URL https://your-app.onrender.com
  4. Update ALLOWED_ORIGINS on Render to include your Vercel URL.

Auto-deploy on every push (Render + Vercel)

  • Vercel: If the project is connected to GitHub, Vercel automatically deploys on every push to main by default.
  • Render: Make sure Auto-Deploy is enabled for your Render service. If your Render service isn’t auto-deploying reliably, add the GitHub Actions fallback below.

GitHub Actions fallback (recommended for Render):

  1. In Render, open your service → SettingsDeploy Hook → copy the hook URL.
  2. In GitHub, add a repo secret named RENDER_DEPLOY_HOOK_URL with that value.
  3. Pushing to main will now trigger Render via .github/workflows/deploy-render.yml.

API Endpoints

Method Path Description
POST /api/v1/audit Submit a repo URL for analysis
GET /api/v1/audit/{id} Get full audit result
GET /api/v1/audit/{id}/status Poll task progress
GET /api/v1/audit/history/{owner}/{repo} Score history across audits
POST /api/v1/compare Compare up to 5 repo URLs
GET /health Health check

Example:

curl -X POST https://repoaudit-api.onrender.com/api/v1/audit \
  -H "Content-Type: application/json" \
  -d '{"url": "https://github.com/owner/repo"}'

Note: You can also submit research paper URLs (e.g., from arXiv, Papers With Code, NeurIPS) and RepoAudit will automatically resolve them to their corresponding GitHub repository.

How Caching Works

  1. On submission, the API resolves the repo's latest commit_hash via git ls-remote
  2. If that hash exists in Upstash Redis (L1) or Supabase Postgres (L2), the cached report is returned instantly
  3. Otherwise, a Celery task clones the repo (depth=1) and runs the full analysis pipeline

Features

  • Deterministic Auto-Remediation Engine: Powered by offline AST manipulation, it instantly fixes high-confidence reproducibility blockers by injecting missing seeds, dynamically pinning unpinned dependencies, and rewriting hardcoded paths, outputting a concrete .patch file natively without LLMs.
  • Data Provenance Auditing: Detects how data is loaded, checks URL liveness, flags gated datasets, and identifies non-deterministic preprocessing.
  • Configuration Drift Detection: Catches discrepancies between claimed hyperparameters in README and actual values in config files or code defaults.
  • Notebook-Specific Deep Analysis: Goes beyond basic extraction to detect out-of-order cell execution (variable used before definition in earlier cells), identifies global state mutations (top-level assignments/imports), verifies "Restart and Run All" compatibility, and flags non-reproducible runtime dependency installations (e.g., !pip install).
  • Execution Replay Verification (Lightweight): Goes beyond static analysis by performing a 4-tier reproduction check in a Bubblewrap sandbox (L0: Deps Install, L1: Import Success, L2: Entry point runs for >5s, L3: Output Production), providing a pass/fail signal for the actual reproducibility of the claimed workflow.
  • Reproducibility Decay Tracking: Quantifies "bit rot" by performing temporal analysis on pinned dependencies. It integrates with PyPI/CRAN/Pkg registries to detect yanked distributions and known CVEs, generating a predicted "shelf-life" score and decay curve visualization for long-term auditability.
  • Reproducibility Scoring: A weighted 0-100 score based on environment, determinism, datasets, and semantic alignment.
  • Pipeline Graph Reconstruction (PGR): Infer end-to-end ML workflows by statically analyzing source code and import relationships. Identifies stages (Dataset, Preprocessing, Training, Eval, Artifact), traces variable propagation across files, and calculates a pipeline completeness score. Visualizes the resulting DAG interactively using ReactFlow.
  • Multi-Repository Comparative Analysis: Compare reproducibility across related repositories (e.g., competing implementations of the same paper). Features an overlaid radar chart for 6-axis category breakdown, identifies the "Golden Standard" implementation with a 0-100 benchmark, and provides a unified comparison dashboard for research reviewers and reproducibility chairs.

GitHub Action

RepoAudit can be integrated into your CI/CD pipeline to automatically audit PRs.

      - uses: sadhumitha-s/RepoAudit@v2.0.0
        with:
          api-url: https://repoaudit-api.onrender.com
          threshold: "70"

Documentation

About

Analyzes ML repositories for reproducibility using AST-based static checks, dependency auditing, and sandboxed execution replay. Supports Python, R, Julia, and notebooks, and produces a weighted score with concrete issues and auto-generated fixes.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors