Author: Iffat Nazir
Repository: AutoML by Iffat336
License: MIT License
Last Updated: November 2025
AutoML Health Prediction is a research-driven project that demonstrates how Automated Machine Learning (AutoML) can be leveraged to predict health risks efficiently and transparently.
By integrating PyCaret, an open-source low-code machine learning library, this notebook automates the entire machine learning pipeline — from data preprocessing to model evaluation — without compromising explainability or accuracy.
This project was created as part of Iffat Nazir’s data science portfolio, focusing on health analytics and intelligent automation.
It is ideal for students, data science enthusiasts, and researchers interested in applying AI to healthcare datasets.
- Build an AutoML pipeline to predict disease likelihood using health indicators.
- Compare multiple ML algorithms automatically for best accuracy.
- Generate interactive visualizations and explainable AI insights.
- Showcase professional workflow for GitHub & Kaggle portfolios.
✅ Fully automated model training using PyCaret
✅ Preprocessing: handling missing values, encoding, normalization
✅ Comparative model leaderboard for accuracy, F1-score, etc.
✅ Visualization suite: correlation heatmaps, confusion matrix, ROC curve
✅ Feature importance and SHAP-based interpretability
✅ Modular notebook structure — easy to adapt for new datasets
✅ Designed to look human, documented like a professional project
| Component | Tool/Library |
|---|---|
| Language | Python 3.10+ |
| Framework | PyCaret |
| Data Manipulation | pandas, numpy |
| Visualization | seaborn, matplotlib |
| Environment | Jupyter Notebook |
| Deployment | GitHub, Kaggle |
You can use any open-source health dataset such as:
- Heart Disease Dataset (Kaggle)
- Cardiovascular Risk Dataset
- Or your own clinical data (if anonymized)
The dataset typically includes features like:
age,sex,blood_pressure,cholesterol,glucose,smoking,exercise, etc.
and a target variable like:diseaseorcardio(1 = disease present, 0 = healthy)
Clone this repository:
git clone https://github.com/iffat336/AutoML.git
cd AutoML
Install dependencies:
pip install -r requirements.txt
Run the notebook:
jupyter notebook AutoML_Health_Prediction.ipynb
📊 Results & Visuals
The notebook generates several insightful plots automatically:
Correlation Heatmap (Feature relationships)
Model Leaderboard (Accuracy comparison)
Confusion Matrix (Prediction quality)
ROC Curve (Model discrimination power)
Feature Importance Plot (Key health predictors)
All visuals are saved in the /visuals folder.
🧬 Insights & Interpretability
AutoML ranked multiple models, and the top-performing one achieved X% accuracy (update with your result).
Feature importance analysis revealed that variables like blood pressure, cholesterol, and BMI were strong predictors of disease risk.
SHAP values further confirmed the explainability of the model outputs — ensuring trustworthy AI for healthcare.
💡 How to Use
Replace the dataset path in the notebook with your CSV file.
Run all cells sequentially.
Review the output — you’ll get:
Best model summary
Evaluation metrics
Visuals saved automatically
📘 Folder Structure
AutoML/
│
├── AutoML_Health_Prediction.ipynb # Main Jupyter Notebook
├── README.md # Project Documentation
├── LICENSE # Open-source License (MIT)
├── requirements.txt # Python dependencies
├── visuals/ # Saved plots and charts
└── data/ # Input datasets (optional)
🧑🔬 Author’s Note
This project is part of my ongoing journey to merge Artificial Intelligence and Health Sciences.
The goal is to create intelligent, data-driven solutions that can empower preventive care, fitness tracking, and early disease detection — forming the foundation for my future app idea, Healix.
If you find this useful, ⭐️ star the repo and follow for future updates.
🧠 Future Improvements
Integrate with Streamlit for real-time web app visualization
Add deep learning models (TensorFlow, PyTorch)
Expand dataset diversity (nutrition, activity tracking)
Deploy trained models as APIs
🤝 Contributions
Contributions are welcome!
If you’d like to improve visuals, add datasets, or optimize models:
Fork this repository
Create a new branch
Commit your changes
Open a Pull Request
📜 License
Distributed under the MIT License.
See LICENSE file for more details.
🌟 Acknowledgements
Special thanks to:
Kaggle Datasets Community for providing open data
PyCaret Developers for simplifying AutoML
GitHub for empowering open-source research