Skip to content

AutoML-driven health prediction system showcasing automated model selection, visual analytics, and explainable AI.

License

Notifications You must be signed in to change notification settings

iffat336/AutoML

Repository files navigation

🧠 AutoML Health Prediction

Author: Iffat Nazir
Repository: AutoML by Iffat336
License: MIT License
Last Updated: November 2025


🌿 Overview

AutoML Health Prediction is a research-driven project that demonstrates how Automated Machine Learning (AutoML) can be leveraged to predict health risks efficiently and transparently.
By integrating PyCaret, an open-source low-code machine learning library, this notebook automates the entire machine learning pipeline — from data preprocessing to model evaluation — without compromising explainability or accuracy.

This project was created as part of Iffat Nazir’s data science portfolio, focusing on health analytics and intelligent automation.
It is ideal for students, data science enthusiasts, and researchers interested in applying AI to healthcare datasets.


🎯 Objectives

  • Build an AutoML pipeline to predict disease likelihood using health indicators.
  • Compare multiple ML algorithms automatically for best accuracy.
  • Generate interactive visualizations and explainable AI insights.
  • Showcase professional workflow for GitHub & Kaggle portfolios.

🧩 Key Features

✅ Fully automated model training using PyCaret
✅ Preprocessing: handling missing values, encoding, normalization
✅ Comparative model leaderboard for accuracy, F1-score, etc.
Visualization suite: correlation heatmaps, confusion matrix, ROC curve
✅ Feature importance and SHAP-based interpretability
✅ Modular notebook structure — easy to adapt for new datasets
✅ Designed to look human, documented like a professional project


🧠 Tech Stack

Component Tool/Library
Language Python 3.10+
Framework PyCaret
Data Manipulation pandas, numpy
Visualization seaborn, matplotlib
Environment Jupyter Notebook
Deployment GitHub, Kaggle

🩺 Data Description

You can use any open-source health dataset such as:

The dataset typically includes features like:

  • age, sex, blood_pressure, cholesterol, glucose, smoking, exercise, etc.
    and a target variable like:
  • disease or cardio (1 = disease present, 0 = healthy)

⚙️ Installation & Setup

Clone this repository:

git clone https://github.com/iffat336/AutoML.git
cd AutoML

Install dependencies:

pip install -r requirements.txt


Run the notebook:

jupyter notebook AutoML_Health_Prediction.ipynb
📊 Results & Visuals

The notebook generates several insightful plots automatically:

Correlation Heatmap (Feature relationships)

Model Leaderboard (Accuracy comparison)

Confusion Matrix (Prediction quality)

ROC Curve (Model discrimination power)

Feature Importance Plot (Key health predictors)

All visuals are saved in the /visuals folder.

🧬 Insights & Interpretability

AutoML ranked multiple models, and the top-performing one achieved X% accuracy (update with your result).
Feature importance analysis revealed that variables like blood pressure, cholesterol, and BMI were strong predictors of disease risk.
SHAP values further confirmed the explainability of the model outputs — ensuring trustworthy AI for healthcare.

💡 How to Use

Replace the dataset path in the notebook with your CSV file.

Run all cells sequentially.

Review the output — you’ll get:

Best model summary

Evaluation metrics

Visuals saved automatically

📘 Folder Structure
AutoML/
│
├── AutoML_Health_Prediction.ipynb     # Main Jupyter Notebook
├── README.md                          # Project Documentation
├── LICENSE                            # Open-source License (MIT)
├── requirements.txt                   # Python dependencies
├── visuals/                           # Saved plots and charts
└── data/                              # Input datasets (optional)

🧑‍🔬 Author’s Note

This project is part of my ongoing journey to merge Artificial Intelligence and Health Sciences.
The goal is to create intelligent, data-driven solutions that can empower preventive care, fitness tracking, and early disease detection — forming the foundation for my future app idea, Healix.

If you find this useful, ⭐️ star the repo and follow for future updates.

🧠 Future Improvements

Integrate with Streamlit for real-time web app visualization

Add deep learning models (TensorFlow, PyTorch)

Expand dataset diversity (nutrition, activity tracking)

Deploy trained models as APIs

🤝 Contributions

Contributions are welcome!
If you’d like to improve visuals, add datasets, or optimize models:

Fork this repository

Create a new branch

Commit your changes

Open a Pull Request

📜 License

Distributed under the MIT License.
See LICENSE file for more details.

🌟 Acknowledgements

Special thanks to:

Kaggle Datasets Community for providing open data

PyCaret Developers for simplifying AutoML

GitHub for empowering open-source research

About

AutoML-driven health prediction system showcasing automated model selection, visual analytics, and explainable AI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published