Azure MLOps (Machine Learning Operations) - Overview

Costa Rica

Last updated: 2026-04-08

List of References (Click to expand)

Table of Contents (Click to expand)

What is MLOps?
MLOps Maturity Levels
Phase 1: Problem Definition & Business Understanding
Phase 2: Data Management & Preparation
Phase 3: Model Development & Experimentation
Phase 4: Model Training at Scale
Phase 5: Model Evaluation & Validation
Phase 6: Model Deployment & Serving
Phase 7: Monitoring & Observability
Phase 8: Retraining & Continuous Improvement
Cross-Cutting Concerns
Infrastructure as Code

What is MLOps?

MLOps (Machine Learning Operations) is a set of practices that combines Machine Learning, DevOps, and Data Engineering to reliably and efficiently deploy and maintain ML models in production. It is the discipline of applying DevOps principles (automation, CI/CD, versioning, monitoring, etc) to the full ML lifecycle.

Note

Microsoft defines MLOps as the union of people, process, and technology to productionize ML models reliably and at scale. The goal is to shorten the cycle from idea to production while maintaining quality, reproducibility, and governance.

The core pillars of MLOps on Azure are:

Pillar	Azure Service / Concept
Data management	Azure Data Lake, Azure ML Datasets, Data versioning
Experimentation	Azure ML Studio, MLflow tracking
Training at scale	Azure ML Compute Clusters, Pipelines
Model registry	Azure ML Model Registry
Deployment	Managed Online / Batch Endpoints
Monitoring	Azure Monitor, Application Insights, Data drift detection
CI/CD	GitHub Actions, Azure DevOps Pipelines
Governance	Azure Policy, Responsible AI dashboard, RBAC

MLOps Maturity Levels

Microsoft defines an MLOps Maturity Model with five levels ranging from manual, ad-hoc processes to fully automated self-healing systems.

Level	Name	Description
0	No MLOps	Manual, notebook-driven. No reproducibility.
1	DevOps but no MLOps	Basic CI/CD for application code only. ML is still manual.
2	Automated training	Training pipelines automated. Models tracked in a registry.
3	Automated model deployment	Deployment pipelines trigger on approved model versions.
4	Full MLOps	End-to-end automation including retraining, monitoring, and drift-triggered pipelines.

Tip

Start by assessing your current maturity level honestly. Most organizations land at Level 0 or 1. Focus on progressing one level at a time rather than trying to implement everything at once.

Phase 1: Problem Definition & Business Understanding

Before writing a single line of code, align on what success looks like. This phase is often underestimated but is the single biggest determinant of whether an ML project delivers value.

Key Activity	Description
Define the business objective	What decision or process will this model support? What KPI will it improve?
Identify the ML task	Classification, regression, forecasting, NLP, computer vision, etc.
Define success metrics	Agree on both ML metrics (accuracy, AUC, RMSE) and business metrics (revenue impact, cost reduction, time saved).
Assess feasibility	Do you have enough data? Is the problem learnable? What is the cost of a wrong prediction?
Map stakeholders	Data owners, model consumers, compliance/legal, and platform team.

Important

Skipping this phase leads to technically correct models that solve the wrong problem. Microsoft's Responsible AI framework requires that the intended use, limitations, and potential harms of a model be documented from the start.

Best Practice	Consideration
Project structure	Use the Team Data Science Process (TDSP) as a lightweight template to organize work across teams.
Responsible AI documentation	Establish a Model Card or AI Use Case Description early, this feeds directly into Responsible AI documentation later.
Service Level Agreements	Define SLAs for model latency, availability, and retraining frequency before any architecture decisions are made.

Phase 2: Data Management & Preparation

Data is the foundation of every ML system. Azure provides a rich ecosystem for storing, versioning, and transforming data at scale.

Key Activity	Description
Data ingestion	Pull data from source systems (databases, APIs, streaming sources) into a central store.
Data exploration (EDA)	Profile the data for missing values, outliers, distribution shifts, and class imbalance.
Feature engineering	Transform raw signals into features meaningful to the model.
Data versioning	Track exactly which version of the dataset was used to train each model version.
Data splitting	Define train / validation / test splits with care to avoid leakage.

Caution

Data leakage between train and test sets is one of the most common and damaging mistakes in ML. Ensure temporal splits for time-series data and that feature computation only uses information available at prediction time.

Need	Azure Services & Tools Offering
Scalable storage	Azure Data Lake Storage Gen2
Structured datasets	Azure ML Datasets (File & Tabular)
Data transformation	Azure Data Factory, Azure Databricks
Data versioning	Azure ML Data Assets with versioned URIs
Labeling	Azure ML Data Labeling (supports ML-assisted labeling)

Best Practice	Consideration
Dataset versioning	Register all datasets as Azure ML Data Assets so every training run references a versioned, traceable data snapshot.
Data validation	Apply data validation checks (schema, row counts, value ranges) as the first step of every pipeline run. Fail fast on bad data.
Secure storage	Store sensitive data in Azure Data Lake Storage Gen2 with hierarchical namespace enabled, access controlled via Azure RBAC and ACLs, never embed credentials in code.
Data protection	Enable soft delete and versioning on Azure Blob/ADLS to protect against accidental deletion.

Phase 3: Model Development & Experimentation

This phase is the most iterative. The goal is to rapidly explore hypotheses, track experiments, and identify the best modeling approach.

Key Activity	Description
Baseline model	Always start with a simple baseline (e.g., majority class classifier, mean predictor) to set a performance floor.
Feature selection & engineering	Iterate on which features move the needle.
Algorithm selection	Try multiple algorithms; avoid premature commitment to a complex model.
Hyperparameter tuning	Systematic search over the parameter space.
Experiment tracking	Log every run, parameters, metrics, artifacts, and environment.

Need	Azure Offering Services & Tools
Interactive development	Azure ML Studio Notebooks, VS Code with Azure ML extension
Experiment tracking	Azure ML Jobs + MLflow autologging
Hyperparameter tuning	Azure ML Sweep Jobs (supports grid, random, Bayesian)
AutoML	Azure AutoML (tabular, NLP, computer vision)
Version control	Git + Azure Repos / GitHub

Tip

Azure ML's sweep jobs natively support early termination policies (Bandit, Median Stopping, Truncation Selection). Always configure early termination on hyperparameter sweeps to avoid wasting compute on poor configurations.

Best Practice	Consideration
Experiment tracking	Use MLflow autologging from day one. Retrofitting experiment tracking onto an existing codebase is painful.
Reproducibility	Pin library versions (`requirements.txt` or `conda.yaml`), seed all random number generators, and use fixed data asset versions for every experiment.
Code separation	Separate research code (notebooks) from production code (Python modules). Notebooks are great for exploration but not for pipelines.
Environment management	Register compute environments as Azure ML Environments so the same Docker image is used in local dev, CI, and production training.

Phase 4: Model Training at Scale

Once you have a promising approach, move from interactive notebooks to automated, parameterized training pipelines that can run reliably on cloud compute.

Key Activity	Description
Refactor training code	Move from notebooks into reusable Python modules/scripts ready for pipeline execution.
Define an Azure ML Pipeline	Structure the workflow into discrete, versioned steps: data prep → feature engineering → training → evaluation.
Parameterize everything	Externalize data version, hyperparameters, and compute target, nothing hardcoded in scripts.
Compute selection	Choose the right cluster type for the workload: CPU for classical ML, GPU for deep learning.
Distributed training	For large models or datasets, configure multi-node training with frameworks like PyTorch DDP or Horovod.

Note

Azure ML Pipelines cache step outputs by default. If the input data and parameters for a step haven't changed, the cached output is reused. This dramatically speeds up iterative pipeline development.

Need	Azure Services & Tools Offering
Orchestration	Azure ML Pipelines (component-based)
Compute	Azure ML Compute Clusters (auto-scaling)
Container images	Azure ML Curated Environments / Custom Environments
Distributed training	Azure ML + PyTorch / TensorFlow distributed
Scheduling	Azure ML Schedules, Azure Data Factory triggers

Best Practice	Consideration
Reusable components	Define each pipeline step as an Azure ML Component with a typed interface (inputs, outputs, parameters) shareable via a component registry.
Scale-to-zero compute	Configure compute clusters with a minimum node count of 0 so they scale to zero when idle and you're not paying for unused resources.
Cost optimization	Use spot/low-priority VMs for non-time-sensitive runs (typically 60–80% cheaper), with checkpointing to handle pre-emption.
Named outputs	Store all training outputs (model artifacts, metrics, logs) as named outputs in the pipeline for automatic tracking and lineage.
Pinned environments	Pin the Azure ML Environment (Docker image + conda/pip dependencies) so training is fully reproducible months later.

Phase 5: Model Evaluation & Validation

A model that performs well on a held-out test set is not automatically ready for production. Validation must go beyond aggregate metrics.

Key Activity	Description
Metric evaluation	Compare against baseline and previous production model version on the same test dataset.
Slice-based evaluation	Measure performance across important data subgroups: demographics, geographies, time periods.
Fairness assessment	Identify disparate impact across protected groups using statistical parity and equalized odds metrics.
Explainability	Understand which features drive predictions globally (SHAP summary) and locally (per-prediction explanations).
Stress testing	Evaluate behavior on edge cases, adversarial inputs, and out-of-distribution samples.
Model registration gate	Only register a model in the registry if it passes all defined quality and fairness thresholds.

Important

Under Microsoft's Responsible AI framework, fairness and explainability are not optional for high-stakes decisions (e.g., credit scoring, hiring, healthcare). Document the RAI assessment as part of the model's release artifacts.

Need	Azure Services & Tools Offering
Responsible AI analysis	Azure ML Responsible AI Dashboard
Fairness assessment	Fairlearn (integrated into Azure ML RAI Dashboard)
Explainability	SHAP / InterpretML (integrated into Azure ML RAI Dashboard)
Counterfactual analysis	DiCE (integrated into Azure ML RAI Dashboard)
Error analysis	ErrorAnalysis (integrated into Azure ML RAI Dashboard)
Model registry	Azure ML Model Registry

Best Practice	Consideration
Responsible AI Dashboard	Use the Azure ML RAI Dashboard for a unified view of fairness, explainability, error analysis, and causal analysis before registering a model.
Promotion criteria as code	Define a Python evaluation script that exits non-zero if thresholds are not met, enabling automated gating in CI/CD pipelines.
Champion comparison	Always compare the new model against the currently deployed champion on the same test dataset, not just an absolute threshold.
Model tagging & lineage	Tag every registered model with training data version, pipeline run ID, Git commit SHA, and key metrics to ensure full traceability.

Phase 6: Model Deployment & Serving

Getting a validated model to production in a reliable, secure, and observable way.

Deployment Pattern	Azure Offering	Use Case
Real-time inference	Managed Online Endpoints (MIR)	Low-latency, synchronous predictions
Batch inference	Batch Endpoints	Large-scale, scheduled scoring
Edge / IoT	Azure IoT Edge + ONNX Runtime	Offline / constrained environments
Embedded in app	Direct SDK / REST API	Custom integration scenarios

Key Activity	Description
Package the model	Create a scoring script (`score.py`) and register the serving environment in Azure ML.
Deploy to staging	Validate the endpoint behavior against integration tests before routing any production traffic.
Blue/green or canary deployment	Gradually shift traffic to the new model version (e.g., 10% → 50% → 100%) to minimize blast radius.
Rollback plan	Document and test the rollback procedure before going live, know the exact steps to revert traffic to the previous deployment.

Tip

For batch workloads, Batch Endpoints are significantly more cost-efficient than keeping an online endpoint scaled up. They spin compute up on demand and scale to zero after the job completes.

Best Practice	Consideration
Managed Online Endpoints	Use Managed Online Endpoints for real-time serving, Microsoft handles provisioning, autoscaling, certificates, and blue/green traffic splitting natively.
Traffic splitting	Configure canary deployments at the endpoint level (e.g., 10% new / 90% current) before committing to full promotion.
Autoscaling	Scale based on request queue depth and CPU/GPU utilization. Set appropriate min/max instance counts to balance cost and availability.
Authentication	Protect all endpoints with Azure AD authentication, never expose unauthenticated endpoints in production.
Smoke & integration tests	Run automated tests against the staging deployment in the CD pipeline before promoting to production.
Registry-based deployments	Reference model artifacts from the Azure ML Model Registry by name and version, never copy files manually.

Phase 7: Monitoring & Observability

A deployed model is not "done." Its performance degrades over time as the real world changes. Monitoring is what keeps production models healthy.

What to Monitor? (Signal)	Description	Azure Service
Operational metrics	Latency, throughput, error rate, availability	Azure Monitor, Application Insights
Data drift	Input feature distributions shift from training baseline	Azure ML Data Drift Monitor
Prediction drift	Output score/label distributions change over time	Azure ML Model Monitor
Model performance	Accuracy degrades when ground truth labels are available	Azure ML Model Monitor
Infrastructure	CPU/GPU/memory utilization, pod health	Azure Monitor, Container Insights

Note

Data drift does not always mean model performance has degraded, but it is a leading indicator. Configure your monitoring to alert on drift and trigger a human review or automated retraining workflow accordingly.

Best Practice	Consideration
Model Monitors	Create Azure ML Model Monitors for every production model, scheduled daily or weekly, with alerts when drift exceeds thresholds.
Application Insights instrumentation	Instrument the scoring script to log prediction inputs, outputs, and latency for every inference request (subject to data privacy requirements).
Operational alerts	Set up Azure Monitor Alerts for P99 latency spikes, error rate increases, and endpoint availability drops.
Baseline dataset	Store a baseline dataset (training data or representative sample) at deployment time, Azure ML uses this as the reference distribution for drift calculations.
Ground truth collection	Collect and store ground truth labels wherever possible to compute actual model performance metrics in production.

Phase 8: Retraining & Continuous Improvement

Models must evolve with the data. The goal of this phase is a closed-loop system where monitoring signals feed back into the training pipeline automatically or with minimal human intervention.

Retraining Trigger Type	Description
Scheduled	Retrain on a fixed cadence (e.g., weekly, monthly) regardless of detected drift
Drift-based	Monitoring detects data or prediction drift above a threshold; triggers retraining pipeline
Performance-based	Model accuracy drops below an acceptable threshold (requires ground truth)
Event-based	A significant upstream event (data schema change, product update) triggers retraining

Tip

Start with scheduled retraining before investing in drift-triggered automation. A reliable weekly retrain often provides 80% of the value of a fully automated system with 20% of the complexity.

Key Activity	Description
Automated retraining pipeline	The same parameterized training pipeline from Phase 4 should be fully triggerable via an event or schedule without manual intervention.
Automated evaluation gate	The retrained model must pass all evaluation thresholds from Phase 5 before being registered, fail the pipeline otherwise.
Automated deployment	A passing model version automatically updates the production endpoint with a canary rollout.
Human-in-the-loop	For high-stakes models, include a mandatory human approval step in the CD pipeline before promoting to production.

Best Practice	Consideration
Event-driven triggers	Wire Azure ML Model Monitor alerts to Azure Event Grid, which can trigger a Logic App or GitHub Actions workflow to kick off a retraining pipeline run.
CI/CD orchestration	Use GitHub Actions or Azure DevOps for orchestration. Keep the ML pipeline YAML definition in source control alongside the training code.
Champion/challenger framework	The current production model is always the champion; every retrained candidate is a challenger evaluated head-to-head before promotion.
Model versioning discipline	Never overwrite a production model artifact. Always register as a new version in the Azure ML Model Registry with full lineage metadata.

Cross-Cutting Concerns

These concerns apply across all phases and should be addressed from the start of the project.

Category	Practices & Considerations
Security & Access Control	- Least-privilege RBAC: Apply minimal permissions at every layer: Workspace, Storage Account, Key Vault, and compute. - Secret management: Store all secrets in Azure Key Vault, never in code, baked-in environment variables, or `terraform.tfvars`. - Managed Identity: Use System-Assigned or User-Assigned Managed Identity to eliminate credential management entirely. - Private endpoints: Enable private endpoints for Workspace, Storage, Key Vault, and Container Registry in production.
Governance & Compliance	- Azure Policy: Enforce organizational standards, allowed regions, required tags, mandatory encryption settings. - Resource tagging: Attach tags (`environment`, `project`, `owner`, `cost-center`) to all resources for cost allocation and reporting. - Responsible AI artifacts: Generate an RAI assessment artifact for every model version promoted to production. - Audit trail: Track all model registrations, deployments, and config changes via Azure Activity Log.
Cost Management	- Budget alerts: Configure alerts in Azure Cost Management for the ML resource group to catch unexpected spend early. - Scale-to-zero training: Use compute clusters that scale to zero nodes when idle, never leave clusters running between jobs. - Dev instance shutdown: Schedule automatic shutdown for development compute instances (e.g., nightly policy). - Workspace hygiene: Regularly delete unused model versions, stale datasets, and old pipeline run logs. - Reserved Instances: Use Reserved Instances for production endpoint compute to reduce costs by up to 40%.

CI/CD Pipeline Structure: A typical MLOps CI/CD pipeline on Azure looks like.

PR / Push to main
    │
    ├── [CI] Lint & unit tests (pytest, flake8)
    ├── [CI] Integration test: run pipeline on sample data
    ├── [CI] Evaluate model — fail if below threshold
    │
    └── [CD] Deploy to staging endpoint
            ├── Smoke tests against staging
            └── [Manual approval gate] ──► Deploy to production endpoint
                                              └── Canary rollout (10% → 100%)

Note

Use GitHub Actions (preferred for open source / GitHub-native projects) or Azure DevOps Pipelines for CI/CD. Both have first-class Azure ML integration via the azure/ml GitHub Actions or AzureML DevOps tasks.

Infrastructure as Code

All Azure ML infrastructure in this repository is provisioned via Terraform. See the terraform-infrastructure/README.md for full deployment instructions.

Best Practice	Consideration
Remote state	Configure the Terraform backend to store state in Azure Blob Storage (see `optional/remote-storage.tf`), never use local state in team environments.
Externalized variables	All environment-specific values are in `terraform.tfvars`, do not commit files with real subscription IDs or secrets to source control.
Managed Identity	The ML workspace uses a system-assigned identity, removing the need to manage service principal credentials.
Naming & tagging	Apply a consistent naming convention and tag all resources with environment, project, and owner tags to support governance and cost management.

Refresh Date: 2026-04-08

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
terraform-infrastructure		terraform-infrastructure
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
metrics.json		metrics.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Azure MLOps (Machine Learning Operations) - Overview

What is MLOps?

MLOps Maturity Levels

Phase 1: Problem Definition & Business Understanding

Phase 2: Data Management & Preparation

Phase 3: Model Development & Experimentation

Phase 4: Model Training at Scale

Phase 5: Model Evaluation & Validation

Phase 6: Model Deployment & Serving

Phase 7: Monitoring & Observability

Phase 8: Retraining & Continuous Improvement

Cross-Cutting Concerns

Infrastructure as Code

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Azure MLOps (Machine Learning Operations) - Overview

What is MLOps?

MLOps Maturity Levels

Phase 1: Problem Definition & Business Understanding

Phase 2: Data Management & Preparation

Phase 3: Model Development & Experimentation

Phase 4: Model Training at Scale

Phase 5: Model Evaluation & Validation

Phase 6: Model Deployment & Serving

Phase 7: Monitoring & Observability

Phase 8: Retraining & Continuous Improvement

Cross-Cutting Concerns

Infrastructure as Code

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages