Skip to content

Tee808-bigD/Azure-ML-labs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

☁️ Azure ML Labs

Azure Machine Learning Python Jupyter

A comprehensive collection of machine learning projects, experiments, and labs built on Azure Machine Learning. This repository showcases end-to-end ML workflows including data preprocessing, model training, hyperparameter tuning, batch scoring, and deployment.


📌 Overview

This repository contains my work and projects completed using Azure ML. It demonstrates practical implementations of:

  • Automated ML pipelines
  • Hyperparameter tuning (HyperDrive)
  • Batch and real-time scoring
  • MLflow tracking
  • Responsible AI dashboards
  • Model deployment to managed endpoints

📁 Project Structure

Azure-ML-labs/ ├── Deployments/ # Deployment configurations and scripts ├── Mlflow/ # MLflow tracking experiments ├── Pipelines/ # Azure ML pipeline definitions ├── diabetes-data/ # Diabetes dataset and preprocessing ├── finalgolddata/ # Gold price prediction data ├── logs and AI dash/ # Logs and Responsible AI dashboards ├── Bank Customer Churn Prediction.csv ├── batch_score.py # Batch scoring script ├── conda_env_v_1_0_0.yml # Conda environment configuration ├── hyperdrive.txt # HyperDrive tuning configuration ├── model.pkl # Trained model pickle file ├── online_score.py # Real-time scoring script ├── prep.py # Data preprocessing ├── preprocess.py # Feature engineering ├── score.py # Scoring logic ├── scoring_file_v_2_0_0.py ├── sweep_train.py # Hyperparameter sweep training ├── train.py # Main training script ├── train_mlflow.py # Training with MLflow tracking ├── train_pipeline.py # Pipeline training script └── ... (logs and artifacts)

text


🚀 Featured Projects

1. Gold Price Predictor

  • Model: RandomForestRegressor (n_estimators=150)
  • R² Score: 0.9994
  • MAE: $10.82
  • Features: Open, High, Low, Close, Volume, Price_Range, Daily_Return, MA_5, MA_20

2. Bank Customer Churn Classifier

  • Model: RandomForestClassifier
  • Features: Customer demographics, account data, transaction history
  • Pipeline: End-to-end preprocessing + training

3. Diabetes Prediction

  • Goal: Predict diabetes progression using medical indicators
  • Approach: Linear regression with feature scaling and hyperparameter tuning

🛠️ Tech Stack

Category Tools
Cloud Platform Azure Machine Learning
Languages Python 3.10
ML Libraries scikit-learn, pandas, numpy
Tracking MLflow
Deployment Azure Container Instances, Managed Endpoints
Environment Conda, Docker

📦 Setup & Installation

Prerequisites

  • Azure subscription
  • Azure ML workspace
  • Python 3.10+

Clone & Configure

# Clone repository
git clone https://github.com/Tee808-bigD/Azure-ML-labs.git
cd Azure-ML-labs

# Create conda environment
conda env create -f conda_env_v_1_0_0.yml
conda activate azure_ml_env

# Configure Azure CLI
az login
az account set --subscription "your-subscription-id"
az ml workspace connect --workspace-name "your-workspace" --resource-group "your-rg"
Run Training
bash
# Train gold price predictor
python train.py --config configs/gold_config.yaml

# Train churn classifier with MLflow
python train_mlflow.py --data_path ./Bank\ Customer\ Churn\ Prediction.csv

# Run hyperparameter sweep
python sweep_train.py
Scoring & Deployment
bash
# Batch scoring
python batch_score.py --input ./data/test_data.csv

# Real-time scoring endpoint
python online_score.py
📊 Key Learnings & Experiments
Experiment	What I Learned
Automated ML	How to let Azure AutoML find the best model and pipeline
HyperDrive	Tuning hyperparameters efficiently using Bayesian sampling
MLflow Tracking	Logging metrics, parameters, and models for experiment comparison
Responsible AI	Building fairness, explainability, and error analysis dashboards
Batch vs Online Scoring	Trade-offs between latency, cost, and throughput
Pipeline Reusability	Creating reusable ML pipelines with reusable components
🔮 Future Work
Add more datasets (fraud detection, time series forecasting)

Implement CI/CD for model retraining and deployment

Create interactive dashboards with Azure Managed Grafana

Add LLMOps experiments with Azure AI Foundry

🤝 Contributing
Feel free to fork this repository and submit pull requests. For major changes, please open an issue first.

📄 License
This project is licensed under the MIT License - see the LICENSE file for details.

📧 Contact
Thando Mzobe

GitHub: @Tee808-bigD

LinkedIn: Thando Mzobe

Email: thandomzobe9@gmail.com

🙏 Acknowledgments
Microsoft Learn for Azure ML documentation and training

Azure ML community for best practices

Built with ☁️ on Azure Machine Learning

text

## How to Add This to Your Repository:

1. Go to your repository: https://github.com/Tee808-bigD/Azure-ML-labs
2. Click on `README.md` (or create it if it doesn't exist)
3. Click the pencil icon (Edit)
4. **Copy and paste** the entire markdown above
5. Scroll down and click **Commit changes**

Your README will now look professional and showcase all your Azure ML work! 🚀

Would you like me to adjust any section or add more details about specific experiments?# Azure-ML-labs
work and projects done with Azure 

About

work and projects done with Azure

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors