MLOps-focused recommendation pipeline: distributed training, experiment tracking, containerized serving.
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
pip install -e .Download the Amazon Reviews dataset (JSONL or CSV format) and place it in the data/ directory:
data/
├── All_Beauty.jsonl # Small dataset for testing
└── Electronics.csv.gz # Large dataset for benchmarking
Supports JSONL, CSV, and CSV.GZ formats. Two split strategies: temporal (default) or leave-one-out.
# Basic (temporal split)
python data/preprocess.py --input data/All_Beauty.jsonl --output-dir data/processed
# With custom thresholds
python data/preprocess.py --input data/All_Beauty.jsonl --output-dir data/processed --min-user 3 --min-item 3
# Leave-one-out split
python data/preprocess.py --input data/All_Beauty.jsonl --output-dir data/processed --split-strategy leave-one-out
# Large CSV dataset
python data/preprocess.py --input data/Electronics.csv.gz --output-dir data/processed_electronics --min-user 3 --min-item 3Preprocessing is tracked in MLflow under the preprocessing experiment.
python training/train_single.py --data-dir data/processed --epochs 10torchrun --nproc_per_node=2 training/train_ddp.py --data-dir data/processed --epochs 10For a realistic CPU-bound workload, the NCF model is upgraded to a NeuMF-style architecture (user/item embeddings + MLP head). On the preprocessed data/processed_electronics_10m subset:
- Single-process NeuMF vs 2-process DDP (gloo, CPU) shows roughly 1.2–1.3× higher throughput for DDP.
- Both runs are tracked in the
ncf-trainingMLflow experiment, withtraining_mode=singlevstraining_mode=ddpandworld_size=2. - A small helper script prints the comparison table:
python training/compare_runs.py --experiment ncf-training --last 2 --markdownThe output can be pasted directly into this README to document the exact numbers you measured on your machine.
python models/baselines.py --data-dir data/processed --baseline allEvaluates popularity, random, and matrix factorization baselines. Results logged to MLflow baselines experiment.
python evaluation/evaluate.py --model-path outputs/model.pt --data-dir data/processedRun hyperparameter search across preprocessing and training configs:
python experiments/sweep.py --input data/All_Beauty.jsonl --config experiments/config.yamlFeatures:
- YAML-based config for preprocessing and training hyperparameters
- Uses DDP by default for faster iteration
- Resource monitoring (auto-adjusts based on CPU/memory)
- Cleans up processed data after each config (reproducible from params)
- All runs tagged with
config_idfor MLflow lineage tracking
Modes:
# Full sweep (preprocess + train for each config)
python experiments/sweep.py --input data/All_Beauty.jsonl --config experiments/config.yaml
# Ablation only (on existing preprocessed data)
python experiments/sweep.py --data-dir data/processed --config experiments/config.yamlmlflow ui # http://127.0.0.1:5000uvicorn serving.app:app --port 8000
curl http://localhost:8000/health
curl -X POST http://localhost:8000/recommend -H "Content-Type: application/json" -d '{"user_id": "USER_ID", "top_k": 5}'# Build
docker build -f docker/Dockerfile.serving -t recommendation-api .
# Run (mount data and model)
docker run -p 8000:8000 \
-v $(pwd)/data/processed:/app/data/processed \
-v $(pwd)/outputs:/app/outputs \
recommendation-apidocker build -f docker/Dockerfile.training -t recommendation-training .
docker run \
-v $(pwd)/data/processed:/app/data/processed \
-v $(pwd)/outputs:/app/outputs \
recommendation-trainingPrerequisites: minikube and kubectl installed, Docker Desktop running, and outputs/model.pt exists.
# 1. Start cluster
minikube start --driver=docker
# 2. Load image into Minikube (no registry needed)
minikube image load recommendation-api:latest
# 3. Mount local data into the Minikube VM — run each in a separate terminal, keep them running
minikube mount $(pwd)/outputs:/mnt/outputs
minikube mount $(pwd)/data/processed:/mnt/data/processed
# 4. Deploy
kubectl apply -f k8s/deployment.yaml
kubectl apply -f k8s/service.yaml
# 5. Wait for pod to be ready
kubectl get pods -w # wait for READY 1/1
# 6. Get service URL and test
minikube service recommendation-api --url
curl <URL>/health
curl -X POST <URL>/recommend \
-H "Content-Type: application/json" \
-d '{"user_id": "USER_ID", "top_k": 5}'# Enable metrics-server for HPA
minikube addons enable metrics-server
kubectl apply -f k8s/hpa.yaml
kubectl get hpa
# Manual scaling demo (scales 1 → 3 → 1 and verifies each pod)
bash k8s/test_scaling.sh
# Full smoke test (pods, /health, /recommend, HPA)
bash k8s/smoke_test.sh├── data/
│ ├── preprocess.py # Raw data → parquet splits
│ └── dataset.py # PyTorch Dataset with negative sampling
├── models/
│ ├── ncf.py # Neural Collaborative Filtering
│ └── baselines.py # Popularity, Random, MF baselines
├── training/
│ ├── train_single.py # Single-node + MLflow
│ └── train_ddp.py # Distributed (PyTorch DDP)
├── evaluation/
│ ├── metrics.py # Recall@K, NDCG@K
│ └── evaluate.py # Model evaluation
├── experiments/
│ ├── sweep.py # Automated hyperparameter sweep
│ ├── resource_monitor.py # CPU/memory monitoring
│ └── config.yaml # Experiment configurations
├── serving/
│ └── app.py # FastAPI inference
├── docker/
│ ├── Dockerfile.serving
│ └── Dockerfile.training
└── k8s/
├── deployment.yaml # Pod spec, resource limits, health probes
├── service.yaml # NodePort service (port 30800)
├── hpa.yaml # Auto-scale 1–5 replicas at 70% CPU
├── test_scaling.sh # Manual scale-up/down demo
└── smoke_test.sh # Full deployment validation
| Decision | Why |
|---|---|
| Temporal split | Prevents data leakage |
| Leave-one-out split | Alternative for per-user evaluation |
| Configurable filtering | Balance data size vs. quality |
| Baselines (Pop, MF) | Benchmark to validate NCF improvement |
| BPR loss | Ranking > rating prediction |
| Volume mounts | Data stays external to images |
| MLflow | Tracks preprocessing, baselines, and training experiments |
| DDP with gloo | CPU-based distributed training |
| YAML experiment config | Research-grade, not hardcoded presets |
| DDP by default in sweep | Faster hyperparameter search |
| Separate DDP benchmark | One-time comparison, not every sweep |
| Resource monitoring | Prevents system overload during sweeps |
| Kubernetes NodePort | Exposes service on Minikube without ingress setup |
| hostPath volumes | Mirrors S3/NFS mount pattern — data stays outside the image |
| HPA on CPU | Auto-scales replicas; production would add custom latency metrics |
| imagePullPolicy: Never | Uses locally loaded image — no registry needed for local dev |