A machine learning–based classification project that predicts whether a person is diabetic based on medical attributes.
The project compares multiple classification models and selects the best-performing one using cross-validation and hyperparameter tuning.
Diabetes is a chronic disease that requires early diagnosis for effective treatment.
This project aims to predict diabetes using patient medical data by applying supervised machine learning classification techniques.
The dataset contains medical attributes such as:
- Glucose level
- Blood pressure
- BMI
- Insulin
- Age
The target variable indicates whether the patient is diabetic or not.
The following classification models are trained and evaluated:
- Logistic Regression
- Decision Tree Classifier
- Random Forest Classifier
- Support Vector Machine (SVM)
Model selection is performed using GridSearchCV with cross-validation.
- Data loading and preprocessing
- Feature selection and splitting into training and testing sets
- Model training using multiple classifiers
- Hyperparameter tuning using GridSearchCV
- Performance comparison based on cross-validation score
- Selection of the best-performing model
- Python
- NumPy
- Pandas
- Scikit-learn
- Matplotlib
- Seaborn