Retail Demand Allocator

End-to-end ML platform for retail demand forecasting and marketing budget optimization using UCI Online Retail dataset

What This Does

Predicts daily retail revenue and optimally allocates marketing budget across channels. Given historical sales data and marketing spend, the platform returns:

Revenue Forecast -- predicted daily revenue using LightGBM
Budget Allocation -- optimal spend distribution across search, email, social, and affiliate channels
Model Comparison -- LightGBM baseline benchmarked against H2O AutoML

Trained on the UCI Online Retail dataset (~541K transactions, ~4,300 customers, Dec 2010 -- Dec 2011).

Architecture

                    +------------------+
                    |   Dashboard UI   |
                    |  (localhost:8000) |
                    +--------+---------+
                             |
                    +--------v---------+
                    |     FastAPI       |
                    | Predict / Train  |
                    | Optimize Budget  |
                    +--+-----+-----+---+
                       |     |     |
            +----------+  +--+--+  +----------+
            |             |     |             |
    +-------v---+  +------v-+  +v--------+  +v-----------+
    | LightGBM  |  |Budget  |  | Metrics |  |  Storage   |
    | Demand    |  |Optimizer| | Prom.   |  | DuckDB     |
    | Forecast  |  +--------+  +---------+  | MinIO      |
    +-----------+                           +------------+

    Training Pipeline (Prefect):
    Ingest -> Validate -> Aggregate -> Synthetic -> Features -> Train -> Evaluate -> Register

Tech Stack

Layer	Technology
API	FastAPI, Uvicorn
Models	LightGBM (baseline), H2O AutoML (benchmark)
Features	Pandas, NumPy (80 features: lag, rolling, adstock, momentum, channel interactions)
Pipeline Orchestration	Prefect (DAG visualization, task state tracking)
Experiment Tracking	MLflow + MinIO (S3-compatible artifact storage)
Monitoring	Prometheus (metrics collection) + Grafana (dashboards)
Storage	DuckDB (features/analytics), MinIO (model artifacts), SQLite (MLflow/Prefect metadata)
Validation	Pandera (schema validation), Pydantic (API schemas, config)
Deployment	Docker, Docker Compose (profiles), GCP free-tier VM

Quick Start

git clone https://github.com/sherozshaikh/retail-demand-allocator.git
cd retail-demand-allocator

Option A: Docker (recommended)

Start Docker (Docker Desktop, or Colima on macOS: colima start --memory 4 --cpu 2).

make up

Open http://localhost:8000 -- pre-trained model is included. Select a scenario and click Predict Revenue, or run a Budget Optimization.

Option B: Local (no Docker)

uv venv .venv --python 3.11 && source .venv/bin/activate
uv pip install -e ".[dev]"
PYTHONPATH=. uvicorn services.api.app.main:app --host 0.0.0.0 --port 8000

Service Dashboard

When running with make up, all services are available:

URL	Service
http://localhost:8000	Platform Dashboard
http://localhost:8000/docs	Swagger API Docs
http://localhost:4200	Prefect Pipeline UI
http://localhost:5000	MLflow Experiment Tracking
http://localhost:9001	MinIO Console (minioadmin / minioadmin)
http://localhost:9090	Prometheus
http://localhost:3000	Grafana (admin / admin)

Models

LightGBM Demand Forecast (Baseline)

Predicts daily revenue from 78 engineered features (lag, rolling mean/std, adstock-transformed marketing spend, promo intensity, channel interactions) across 305 trading days.

Metric	Value
RMSE	15,613.84
MAE	7,828.72
R2	0.669
MAPE	11.18%
Training Time	0.28s

Benchmark: LightGBM vs H2O AutoML

H2O AutoML was benchmarked against LightGBM on the same hold-out test set (last 20%, time-ordered). H2O trains GBM, XGBoost, DRF, Deep Learning, and Stacked Ensembles, then picks the best model.

Metric	LightGBM	H2O Best (Stacked Ensemble)	Winner
RMSE	15,613.84	12,951.29	H2O
MAE	7,828.72	7,449.79	H2O
R2	0.669	0.773	H2O
MAPE	11.18%	11.85%	LightGBM
Training Time	0.28s	121.28s	LightGBM (433x faster)

Decision: LightGBM is the production model. H2O's stacked ensemble has better RMSE/R2 but is 433x slower, requires a JVM, and adds ~500MB to the image. LightGBM's sub-second training enables rapid iteration and fits on a free-tier VM.

Ablation Study

A hyperparameter search (make ablation) was performed across 60 random LightGBM configurations to find the optimal baseline. The best config was auto-applied to configs/base.yaml. Results saved to data/artifacts/ablation_lightgbm.json.

Run the benchmark locally: make benchmark (requires uv pip install -e ".[dev,benchmark]")

Budget Optimizer

Three allocation strategies for distributing marketing budget across channels:

Method	Description
Equal Split	Divide budget equally across channels (baseline)
Heuristic	Weight channels by historical ROI coefficients
Evolutionary Search	Dirichlet-initialized population with mutation annealing; finds optimal allocation maximizing predicted revenue

curl -X POST http://localhost:8000/v1/optimize-budget \
  -H "Content-Type: application/json" \
  -d '{
    "total_budget": 10000,
    "channels": ["search", "email", "social", "affiliate"],
    "method": "search"
  }'

API Endpoints

Method	Endpoint	Description
POST	`/v1/predict`	Predict daily revenue from feature inputs
POST	`/v1/optimize-budget`	Optimize marketing budget allocation
POST	`/v1/train`	Trigger model retraining in background
GET	`/v1/runs/{run_id}`	Check training run status
GET	`/v1/health`	Health check (model load status)
GET	`/v1/metrics`	Prometheus metrics

Example: Predict Revenue

curl -s -X POST http://localhost:8000/v1/predict \
  -H "Content-Type: application/json" \
  -d '{"features": {"day_of_week": 3, "month": 6, "total_revenue_lag_1": 12500, "total_revenue_roll_mean_7": 11800, "total_spend": 1200}}' | python3 -m json.tool

Example: Train Model

curl -X POST http://localhost:8000/v1/train \
  -H "Content-Type: application/json" \
  -d '{"config_path": "configs/base.yaml", "skip_h2o": true}'

Training Pipeline

The training pipeline is orchestrated by Prefect with @flow and @task decorators:

Ingest Data -> Validate Schema -> Daily Aggregation -> Generate Synthetic Marketing
    -> Build Features -> Train LightGBM -> [Train H2O] -> Evaluate Models -> Register to MLflow

Train via Docker (with Prefect UI + MLflow)

make up      # start API + all infra
make train   # run one-shot training pipeline

View the DAG and task states at http://localhost:4200. Model metrics and artifacts at http://localhost:5000.

Train Locally (no Docker needed)

make train-local    # LightGBM only, fast (~6s)
make ablation       # hyperparameter search (60 trials, ~1min)
make benchmark      # LightGBM vs H2O comparison

Deploy to Cloud (Free Tier)

The API Docker image includes a pre-trained LightGBM model. Deploy in under 5 minutes:

# On any Ubuntu 22.04 VM (GCP e2-micro or AWS t2.micro):
sudo apt-get update && sudo apt-get install -y docker.io
sudo fallocate -l 1G /swapfile && sudo chmod 600 /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile
sudo docker pull sherozshaikh/retail-demand-allocator-api:v1.0.0
sudo docker run -d --name rda-api --restart unless-stopped -p 8000:8000 -e PYTHONPATH=/app -e RDA_CONFIG_PATH=configs/base.yaml sherozshaikh/retail-demand-allocator-api:v1.0.0

Open http://<VM_EXTERNAL_IP>:8000 (ensure port 8000 is open in the firewall).

Full step-by-step guides:

GCP: docs/DEPLOY_GCP.md (e2-micro, free tier)
AWS: docs/DEPLOY_AWS.md (t2.micro, free tier)

Run Tests

make test-local   # 69 tests, local
make test         # 69 tests, inside Docker container

Project Structure

retail-demand-allocator/
├── configs/
│   ├── base.yaml                  # Production config
│   ├── dev.yaml                   # Development overrides
│   └── test.yaml                  # Test overrides (in-memory DuckDB)
├── data/
│   ├── raw/                       # UCI Online Retail dataset (.xlsx/.csv)
│   ├── processed/                 # Daily aggregates, feature datasets
│   ├── synthetic/                 # Generated marketing spend & promo calendar
│   └── artifacts/                 # Trained models, benchmark results
│       ├── lightgbm/              # LightGBM model + feature importance
│       ├── ablation_lightgbm.json # Hyperparameter search results
│       └── benchmark_results.json # LightGBM vs H2O comparison
├── ml/
│   ├── contracts/                 # Pydantic config, structured logging
│   ├── data_access/               # DuckDB store, CSV/XLSX ingestion
│   ├── validation/                # Pandera schemas + validator
│   ├── transforms/                # Daily sales aggregation
│   ├── synthetic/                 # Deterministic marketing data generator
│   ├── features/                  # Feature engineering (80 features)
│   ├── models/                    # LightGBM, H2O AutoML, evaluation, registry
│   ├── optimization/              # Budget allocation (equal, heuristic, evolutionary)
│   ├── tracking/                  # MLflow wrapper
│   └── metrics/                   # Prometheus metrics
├── services/
│   ├── api/                       # FastAPI service + dashboard UI
│   │   ├── Dockerfile
│   │   └── app/
│   │       ├── main.py            # App entry point
│   │       ├── routes/            # API endpoints
│   │       ├── schemas/           # Request/response models
│   │       ├── static/index.html  # Dashboard UI
│   │       └── telemetry/         # Prometheus middleware
│   ├── pipeline/                  # Prefect training pipeline
│   │   ├── Dockerfile
│   │   └── app/
│   │       ├── flows/             # @flow definition
│   │       ├── tasks/             # @task definitions (data, features, model)
│   │       └── orchestration/     # Pipeline runner
│   └── scheduler/                 # Prefect scheduler
├── scripts/
│   ├── run_flow.py                # CLI for local training
│   ├── ablation_lightgbm.py       # LightGBM hyperparameter search
│   └── benchmark_h2o.py           # LightGBM vs H2O AutoML benchmark
├── notebooks/
│   └── eda.ipynb                  # Exploratory data analysis (7 sections)
├── tests/                         # 69 tests (unit, integration, smoke)
├── infra/
│   ├── mlflow/Dockerfile          # MLflow server image
│   └── monitoring/                # Prometheus + Grafana configs
├── docs/
│   ├── DEPLOY_GCP.md              # GCP free-tier deployment guide
│   └── screenshots/               # Dashboard, Prefect, MLflow, Grafana screenshots
├── docker-compose.yml             # Full stack with profiles (infra, train)
├── pyproject.toml                 # Dependencies and project metadata
└── Makefile                       # Development and deployment commands

Makefile Targets

make up            Start everything (API + MLflow + MinIO + Prefect + Prometheus + Grafana)
make down          Stop all containers
make down-clean    Stop all containers and remove volumes

make test          Run tests inside Docker container
make test-local    Run tests locally

make train         Retrain models via Docker pipeline (with Prefect DAG + MLflow logging)
make train-local   Train locally (LightGBM only, ~6s)
make ablation      Ablation study: find best LightGBM hyperparameters (~1min)
make benchmark     Benchmark LightGBM vs H2O AutoML (~2min)

make build         Build Docker images for Docker Hub (linux/amd64)
make push          Build and push images to Docker Hub
make pull          Pull pre-built images from Docker Hub

make verify        Check all service health endpoints
make format        Format code (isort, black, ruff)
make lint          Lint code (ruff)
make clean         Remove generated files

Data Notes

The base retail transaction data is from the UCI Machine Learning Repository (public domain research dataset)
Marketing spend and promotion calendar tables are synthetic but deterministic -- generated programmatically with fixed seeds for reproducibility
All synthetic generation logic is transparent and versioned in ml/synthetic/
Raw data is cached as CSV after first xlsx read for fast subsequent loads (<1s vs ~80s)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
data		data
docs		docs
infra		infra
ml		ml
notebooks		notebooks
scripts		scripts
services		services
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
RUNBOOK.md		RUNBOOK.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Retail Demand Allocator

What This Does

Architecture

Tech Stack

Quick Start

Option A: Docker (recommended)

Option B: Local (no Docker)

Service Dashboard

Models

LightGBM Demand Forecast (Baseline)

Benchmark: LightGBM vs H2O AutoML

Ablation Study

Budget Optimizer

API Endpoints

Example: Predict Revenue

Example: Train Model

Training Pipeline

Train via Docker (with Prefect UI + MLflow)

Train Locally (no Docker needed)

Deploy to Cloud (Free Tier)

Run Tests

Project Structure

Makefile Targets

Data Notes

Screenshots

Dashboard -- Demand Prediction (Normal Scenario)

Dashboard -- Budget Optimizer & Training History

Dashboard -- Infrastructure Services & Dataset Info

Prefect -- Training Pipeline DAG

MLflow -- Experiment Tracking

Grafana -- Monitoring Dashboard

Prometheus -- API Metrics

MinIO -- Model Artifact Storage

FastAPI -- Interactive API Docs

License

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages