A production-ready MLOps pipeline for Premier League match prediction with automated monitoring, orchestration, and betting simulation.
This project is my final capstone for the MLOps Zoomcamp course by DataTalks.Club. The course provided comprehensive training in machine learning operations, covering everything from experiment tracking to production deployment.
As a football enthusiast and ML practitioner, I wanted to create a project that combines my passion for the Premier League with the MLOps skills learned throughout the course. This system demonstrates:
- Real-world application of MLOps principles to sports analytics
- End-to-end pipeline from data ingestion to production deployment
- Production-ready practices including monitoring, orchestration, and automated workflows
- Practical value through betting simulation and match prediction
The goal was to build not just a model, but a complete MLOps system that could realistically be deployed and maintained in production, showcasing all the key concepts from the course.
- 🎯 Overview
- ✨ Features
- 🏗️ Architecture
- 🚀 Quick Start
- 🧪 Testing
- 📊 Monitoring & Orchestration
- 🚀 Cloud Deployment
- 🛠️ Development
- 📡 API Endpoints
- 📚 Documentation
This MLOps system provides a complete end-to-end pipeline for predicting Premier League match outcomes. Built with modern MLOps practices, it includes automated training, real-time predictions, comprehensive monitoring, and orchestrated workflows.
- 🤖 61.84% Model Accuracy - Random Forest classifier with 15 engineered features
- ⚡ Real-time Predictions - FastAPI-powered REST API with sub-second response times
- 📊 Comprehensive Monitoring - Grafana dashboards with PostgreSQL metrics storage
- 🔄 Automated Orchestration - Prefect workflows for training, monitoring, and alerts
- 💰 Betting Simulation - Automated betting strategy testing and validation
- 🐳 Containerized Deployment - Docker Compose for easy deployment and scaling
- 🧪 Full Test Coverage - Unit and integration tests for all components
- 🚀 Cloud Deployment - Ready-to-deploy configurations for Railway, Render, and Fly.io
- 🔧 CI/CD Pipeline - GitHub Actions for automated testing, linting, and deployment
- 📝 Code Quality - Ruff linting and formatting with pre-commit hooks
- Premier League Match Prediction - Predict match outcomes (Home/Draw/Away)
- Feature Engineering - 15 carefully crafted features including team form, head-to-head records
- Model Versioning - MLflow integration for experiment tracking and model registry
- Automated Retraining - Scheduled model updates based on performance thresholds
- FastAPI REST API - Comprehensive endpoints for predictions, model info, and betting
- Real-time Data Integration - Automated data fetching and processing
- Betting Simulation - Strategy testing with configurable parameters
- Health Monitoring - Comprehensive health checks and status endpoints
- Grafana Dashboards - Real-time visualization of model performance and system metrics
- Performance Tracking - Model accuracy, drift detection, and prediction confidence
- Alert System - Automated notifications for performance degradation
- Comprehensive Logging - Structured logging across all components
- Prefect Workflows - 4 automated workflows for different operational needs:
- Hourly Monitoring - Model performance and drift detection
- Daily Predictions - Generate predictions for upcoming matches
- Weekly Retraining - Automated model retraining evaluation
- Emergency Retraining - Manual trigger for immediate model updates
- Python 3.10+ with
uvpackage manager - PostgreSQL (local or Docker)
- Grafana server (for monitoring dashboards)
- 15 minutes setup time
# 1. Clone and setup
git clone https://github.com/your-username/mlops-2025-final_project.git
cd mlops-2025-final_project
# 2. One-command setup and start
make setup
make start
# 3. Test system
make test# 1. Clone and install
git clone https://github.com/your-username/mlops-2025-final_project.git
cd mlops-2025-final_project
uv sync
# 2. Start Docker services (PostgreSQL)
docker-compose up -d
# 3. Initialize system
cp config.env.example .env
uv run python scripts/setup_database.py
# 4. Start core services (5 terminals)
uv run mlflow server --host 127.0.0.1 --port 5000 # Terminal 1
uv run python -m src.pipelines.training_pipeline # Terminal 2 (once)
cd src/api && uv run uvicorn main:app --host 0.0.0.0 --port 8000 --reload # Terminal 3
uv run prefect server start --host 0.0.0.0 --port 4200 # Terminal 4
sudo systemctl start grafana-server # Terminal 5
# 5. Setup Grafana dashboard
uv run python scripts/setup_grafana.py
# 6. Test complete system
uv run python scripts/test_simple_integration.py
uv run python scripts/test_simple_monitoring.py# Service Management
make start # Start all services
make stop # Stop all services
make restart # Restart all services
make status # Check service status
# Development
make setup # Complete setup
make test # Run integration tests
make train # Run training pipeline
make clean # Clean up resources
# Individual Services
make start-docker # Start Docker only
make start-mlflow # Start MLflow only
make start-api # Start API only
make start-prefect # Start Prefect only
make start-grafana # Start Grafana only
# View all commands
make help- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- MLflow: http://127.0.0.1:5000
- Prefect UI: http://localhost:4200
- Grafana: http://localhost:3000 (admin/admin)
- Health Check: http://localhost:8000/health
- Random Forest Model - 61.84% accuracy on 3,040 Premier League matches
- REST API - FastAPI with comprehensive endpoints
- Real-time Predictions - Premier League match outcomes
- Betting Simulation - Automated betting strategy testing
- MLflow Integration - Model tracking and versioning
- PostgreSQL Database - Complete data persistence
- Prefect Workflows - 4 automated flows for operational needs
- Grafana Dashboards - Real-time monitoring with PostgreSQL data source
- Performance Tracking - Model drift detection and accuracy monitoring
- Alert System - Automated notifications for performance degradation
graph TB
subgraph "Data Layer"
PG[(PostgreSQL<br/>Database)]
DATA[Premier League<br/>Match Data]
end
subgraph "ML Pipeline"
TRAIN[Training Pipeline]
MLF[MLflow<br/>Tracking]
MODEL[Model Registry]
PRED[Prediction Pipeline]
end
subgraph "API Layer"
API[FastAPI<br/>REST API]
BET[Betting<br/>Simulator]
end
subgraph "Monitoring & Orchestration"
PREF[Prefect<br/>Workflows]
GRAF[Grafana<br/>Dashboards]
ALERT[Alert System]
end
subgraph "Infrastructure"
DOCKER[Docker<br/>Compose]
MAKE[Makefile<br/>Commands]
end
DATA --> TRAIN
TRAIN --> MLF
MLF --> MODEL
MODEL --> PRED
PRED --> API
API --> BET
BET --> PG
PRED --> PG
PREF --> TRAIN
PREF --> PRED
PREF --> GRAF
GRAF --> ALERT
PG --> GRAF
DOCKER --> PG
DOCKER --> MLF
MAKE --> DOCKER
MAKE --> API
MAKE --> PREF
| Component | Technology | Purpose | Port |
|---|---|---|---|
| API Server | FastAPI + Uvicorn | REST API endpoints | 8000 |
| ML Tracking | MLflow | Experiment tracking & model registry | 5000 |
| Database | PostgreSQL | Data persistence & metrics | 5432 |
| Orchestration | Prefect | Workflow automation | 4200 |
| Monitoring | Grafana | Dashboards & visualization | 3000 |
| Containerization | Docker Compose | Service orchestration | - |
- Training Flow:
Data → Feature Engineering → Model Training → MLflow → Model Registry - Prediction Flow:
API Request → Model Loading → Feature Processing → Prediction → Response - Monitoring Flow:
Metrics Collection → PostgreSQL → Grafana → Alerts - Orchestration Flow:
Prefect Scheduler → Workflows → Monitoring → Notifications
mlops-2025-final_project/
├── 📁 src/
│ ├── 📁 api/ # FastAPI application
│ ├── 📁 pipelines/ # ML training & prediction pipelines
│ ├── 📁 betting_simulator/ # Betting strategy simulation
│ ├── 📁 monitoring/ # Metrics collection & storage
│ ├── 📁 orchestration/ # Prefect workflows
│ └── 📁 data_integration/ # Data fetching & processing
│ └── 📁 retraining/ # Entry point for the MLOps betting simulation system
├── 📁 tests/ # Test suites
│ ├── 📁 unit/ # Unit tests
│ └── 📁 integration/ # Integration tests
├── 📁 scripts/ # Setup & testing scripts
├── 📁 data/ # Training data & datasets
├── 📁 grafana/ # Grafana dashboards & config
├── 📁 alerts/ # Alert configurations
├── 📁 deployment/ # Cloud deployment documentation
├── 📁 .github/workflows/ # CI/CD pipelines
├── 📄 docker-compose.yml # Container orchestration
├── 📄 Dockerfile # Production container image
├── 📄 railway.toml # Railway deployment config
├── 📄 render.yaml # Render deployment config
├── 📄 .pre-commit-config.yaml # Pre-commit hooks
├── 📄 Makefile # Development commands
├── 📄 pyproject.toml # Python dependencies & config
└── 📄 README.md # This file
make test # Run all integration tests
make test-orch # Test orchestration components
make health # Health check all services- ✅ Unit Tests - Core business logic and utilities
- ✅ Integration Tests - End-to-end workflow testing
- ✅ API Endpoints - All REST endpoints tested
- ✅ ML Pipeline - Training and prediction workflows
- ✅ Database - Schema and data integrity
- ✅ Monitoring - Metrics collection and alerts
- ✅ Orchestration - Prefect workflow execution
- ✅ CI/CD Pipeline - Automated testing on pull requests
# Run unit tests
uv run pytest tests/unit/ -v
# Run integration tests
uv run pytest tests/integration/ -v
# Test API integration
uv run python scripts/test_simple_integration.py
# Test monitoring workflows
uv run python scripts/test_simple_monitoring.py
# Test end-to-end orchestration
uv run python scripts/test_end_to_end_monitoring.py# Run linting
uv run ruff check src/ tests/
# Run formatting
uv run ruff format src/ tests/
# Run pre-commit hooks
uv run pre-commit run --all-filesThe system includes 4 automated workflows:
# Start Prefect server
uv run prefect server start --host 0.0.0.0 --port 4200
# Deploy and run workflows
uv run python -m src.orchestration.scheduler
# Manual workflow triggers
uv run python scripts/test_simple_orchestration.pyAvailable Flows:
- Hourly Monitoring - Model performance & drift detection
- Daily Predictions - Generate predictions for upcoming matches
- Weekly Retraining - Automated model retraining evaluation
- Emergency Retraining - Manual retraining trigger
Real-time monitoring with PostgreSQL data source:
# Start Grafana
sudo systemctl start grafana-server
# Setup dashboard (automated)
uv run python scripts/setup_grafana.py
# Manual setup:
# 1. Go to http://localhost:3000 (admin/admin)
# 2. Add PostgreSQL data source (localhost:5432/mlops_db)
# 3. Import: grafana/dashboards/corrected_mlops_dashboard.jsonDashboard Features:
- Model performance metrics over time
- Prediction accuracy tracking
- System health indicators
- Real-time alerts and notifications
- Model accuracy drops below 55%
- API response time exceeds 1 second
- Database connection failures
- Service downtime detection
The system is ready for cloud deployment with multiple platform options:
# 1. Connect your GitHub repository to Railway
# 2. Add PostgreSQL service
# 3. Deploy automatically using railway.toml configuration# 1. Connect repository to Render
# 2. Use render.yaml blueprint for automatic setup
# 3. Add PostgreSQL database service# Install Fly CLI and deploy
fly launch
fly deploy# Build production image
docker build -t premier-league-mlops .
# Run with environment variables
docker run -p 8000:8000 \
-e POSTGRES_HOST=your-db-host \
-e POSTGRES_USER=your-db-user \
-e POSTGRES_PASSWORD=your-db-password \
premier-league-mlopsRequired for cloud deployment:
POSTGRES_HOST,POSTGRES_PORT,POSTGRES_DBPOSTGRES_USER,POSTGRES_PASSWORDMLFLOW_TRACKING_URI(optional, defaults to SQLite)MODEL_REGISTRATION_THRESHOLD(optional, default: 0.6)
- Endpoint:
/health - Timeout: 30 seconds
- Auto-restart: On failure
See deployment/cloud-deployment.md for detailed instructions.
# API development
cd src/api && uv run uvicorn main:app --reload
# Run training pipeline
uv run python -m src.pipelines.training_pipeline
# Run prediction pipeline
uv run python -m src.pipelines.prediction_pipeline
# Database utilities
uv run python scripts/check_db_tables.py
uv run python scripts/clean_postgres.py
uv run python scripts/setup_database.py
# Monitoring and orchestration
uv run python scripts/test_simple_monitoring.py
uv run python scripts/test_simple_orchestration.py# Install pre-commit hooks
uv run pre-commit install
# Run all pre-commit hooks
uv run pre-commit run --all-files
# Run specific tests
uv run pytest tests/unit/ -v
uv run pytest tests/integration/ -v
# Check code coverage
uv run pytest --cov=src tests/- Retry Logic - Automatic retry on failures
- Notifications - Slack/email alerts on completion
- Logging - Comprehensive workflow execution logs
- Monitoring - Real-time workflow status tracking
# Check service status
make status
# View logs
docker-compose logs -f
# Reset database
uv run python scripts/clean_postgres.py
uv run python scripts/setup_database.py
# Check API health
curl http://localhost:8000/health- GET
/health- System health check - GET
/- API information and status - POST
/predict- Single match prediction - GET
/predictions/today- Today's match predictions - GET
/model/info- Current model information - GET
/model/performance- Model performance metrics
- POST
/betting/simulate- Run betting simulation - GET
/betting/stats- Betting statistics - GET
/betting/balance- Current betting balance
- GET
/monitoring/metrics- System metrics - POST
/monitoring/alert- Trigger alerts - GET
/monitoring/drift- Model drift analysis
# Get system health
curl http://localhost:8000/health
# Make a prediction
curl -X POST http://localhost:8000/predict \
-H "Content-Type: application/json" \
-d '{"home_team": "Arsenal", "away_team": "Chelsea", "home_odds": 2.1, "away_odds": 3.2, "draw_odds": 3.5}'
# Get today's predictions
curl http://localhost:8000/predictions/today
# Check model performance
curl http://localhost:8000/model/performance- API Documentation - Complete API reference
- Cloud Deployment Guide - Detailed deployment instructions
This project is licensed under the MIT License - see the LICENSE file for details.
This project was made possible thanks to:
- DataTalks.Club for the excellent MLOps Zoomcamp course
- Alexey Grigorev and the course instructors for their comprehensive MLOps training
- MLOps Zoomcamp community for support, discussions, and shared learning experiences
MLOps Zoomcamp Final Project - Complete MLOps system for Premier League match prediction