Skip to content

rohchav/ball_predictor

Repository files navigation

NBA Pregame Win Predictor

Full-stack, reproducible pipeline to estimate NBA win probabilities before tipoff. Python/FastAPI backend, Next.js frontend, LightGBM + NEAT models, PostgreSQL for persistence.

Features

  • Automated ingestion from Balldontlie (schedule/results), NBA.com stats (nba_api) as primary, Kaggle CSVs for odds/schedules/box scores, and optional live odds via The Odds API.
  • Feature builders for schedule context, roster health/continuity, priors, recency, and odds; synthetic features let us predict any matchup/date even when out-of-window.
  • LightGBM pipeline with permutation pruning, calibration, SHAP, and time-aware splits; NEAT-based neuroevolution for feature discovery.
  • FastAPI service exposing /predict, /games, /teams, /elo, /feature-usage, /model-metadata; Postgres-backed or parquet fallback.
  • Next.js + Tailwind UI for today’s slate, arbitrary head-to-head predictions, model snapshot, and diagnostics.
  • Cron-friendly daily refresh scripts to append new games and retrain artifacts.

Tech Stack

  • Backend: Python 3.11, FastAPI, Uvicorn, Pydantic v2, httpx/tenacity, structlog, pandas/pyarrow/numpy, SQLAlchemy + psycopg, cachetools.
  • Modeling: LightGBM, scikit-learn (calibration/metrics), SHAP, deap + custom NEAT, joblib, mlflow hooks.
  • Data sources: nba_api (scoreboard/stats), Balldontlie, Kaggle NBA datasets, optional The Odds API.
  • Frontend: Next.js 14 (App Router), React 18, TypeScript, TailwindCSS, axios; Jest/RTL + Playwright for tests.
  • Infra: PostgreSQL primary store; artifacts in artifacts/ (ignored); cron scripts in CRON_SETUP_*.

Getting Started

  1. Install Python 3.11 and Node.js 18+.
  2. Create a virtual environment and install dependencies:
    python -m venv .venv
    source .venv/bin/activate
    pip install -e .[dev]
  3. Copy the example environment file and add secrets:
    cp .env.example .env
  4. Install Node dependencies for the web app:
    cd src/web/next-app
    npm install
  5. During development run both services together:
    npm run dev
    This starts Next.js on 3000 and FastAPI (uvicorn) on 8000.

Required Secrets

Variable Description
BALLDONTLIE_API_KEY Aux schedule/results; free tier limits apply.
THEODDS_API_KEY Optional live odds (The Odds API).
KAGGLE_USERNAME, KAGGLE_KEY Optional Kaggle CLI credentials for automated dataset downloads.
DATABASE_URL PostgreSQL connection string (e.g., postgresql://user:pass@127.0.0.1:5433/nba).

Never commit .env or secret files. Use system keyring if available.

Configuration

Defaults in configs/default.yml. Override via CLI flags or alt config files. Key sections: data (season range/windows), sources (provider toggles), model (LightGBM/odds/calibrator), pruning, recency, web (odds toggle defaults).

Data Sources & Links

Anti-Leakage Policy

All feature builders honor cutoff_ts (pre-tipoff). Chronological splits only; no shuffled splits. Tests guard that max timestamps precede labels.

CLI Entry Points (selected)

  • python scripts/validate_data.py --output reports/data_check.json
  • python scripts/fetch_games.py --season-start 2024 --season-end 2025 --plan free
  • python scripts/fetch_recent_games.py --days-back 3 --write-to-db
  • python scripts/build_dataset.py --config-path configs/default.yml
  • python scripts/train_gbm.py --config configs/default.yml
  • python scripts/train_gbm_timeseries.py --config-path configs/default.yml
  • python scripts/tune_gbm.py --max-trials 30
  • python scripts/feature_prune.py --config configs/default.yml --grouped true
  • python scripts/evaluate.py --dataset-path data/processed/pregame_dataset_latest.parquet
  • python scripts/neuroevo_run.py --dataset-path data/processed/pregame_dataset.parquet --generations 5
  • uvicorn src.api.service:app --reload

Postgres Mode

  • Spin up Postgres (example): docker run --rm -p 5432:5432 -e POSTGRES_PASSWORD=postgres -e POSTGRES_USER=postgres -e POSTGRES_DB=nba postgres:16
  • Set DATABASE_URL.
  • Init tables: python scripts/setup_db.py.
  • Ingest: python scripts/fetch_games.py --season-start 2024 --season-end 2025 --write-to-db and python scripts/build_dataset.py --config-path configs/default.yml --write-to-db --db-chunk-size 500.
  • Push artifacts: python scripts/push_artifacts_to_db.py.
  • Verify: python scripts/check_db.py.
  • Daily refresh/retrain: python scripts/fetch_recent_games.py --days-back 3 --write-to-db and python scripts/auto_update_model.py (primary nba_api, fallback Balldontlie).

Development

  • Run ruff check . and pytest before submitting changes.
  • Frontend lint: cd src/web/next-app && npm run lint; tests: npm test; e2e: npm run e2e.
  • Keep data directories empty in git; artifacts live in artifacts/ and reports/ (ignored).
  • See docs/monitoring.md for performance/data-quality checks.

Feature Flags

  • Env-based: FEATURE_<NAME>=on/true/1; helper at src/utils/feature_flags.py (none active).

API Contract

  • Shapes/auth in docs/api_contract.md.

Security

  • Secrets stay out of git; logs mask secret values except last four chars.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors