Skip to content

SayanAndrews2002/Machine-Learning-Heart-Patient-Data-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Heart Patient Classification

Summary

The Real-world Heart Patient Data Project was a significant undertaking where I leveraged my skills in data analytics and machine learning to analyze a comprehensive dataset related to heart patients. The dataset included a multitude of predictor variables aimed at predicting whether an individual had heart disease or not.

Project Details

Data Exploration

  • Initial Exploration: Conducted an in-depth exploration of the dataset, examining variable distributions, identifying outliers, and assessing data quality.
  • Preprocessing: Loaded necessary libraries, checked dataset dimensions, and handled any missing data.
  • Exploratory Data Analysis (EDA): Focused on the predictor variables most relevant to the target variable, exploring their relationships with each other and with the target variable. This step provided initial insights and allowed for inferences, particularly regarding variable correlations.

Data Splitting

  • Training and Testing Split: Split the data into training and testing sets to facilitate model validation.
  • Recipes: Developed and applied data preprocessing recipes.
  • Stratified Sampling: Ensured balanced representation in the training and testing datasets.
  • Correlation Analysis: Explored correlations between continuous variables and the target variable to inform feature selection.

Model Building

  • Machine Learning Models: Developed multiple predictive models including:
    • Logistic Regression
    • Decision Trees
    • Random Forests
    • K-Nearest Neighbors (KNN)
    • Linear Discriminant Analysis (LDA)
    • Quadratic Discriminant Analysis (QDA)

Model Fitting and Tuning

  • Model Tuning: Enhanced model accuracy using hyperparameter optimization.
  • Cross-Validation: Employed techniques like cross-validation with random forests, ranger, and XGBoost engines, fitting models to folded data for robust evaluation.

Model Selection and Performance Evaluation

  • Performance Metrics: Evaluated model performance based on metrics such as the area under the ROC curve (AUC) for both training and testing datasets.
  • Heatmaps: Used heatmaps to visualize model performance and understand predictor relationships.

Insights and Recommendations

  • Key Predictors: Identified significant predictors of heart disease from the model outcomes.
  • Actionable Recommendations: Provided recommendations based on the insights generated from the analysis.
  • Conclusion: Summarized the findings and insights, offering potential real-world applications of the model for heart disease prediction.

Conclusion

This project showcased my proficiency in machine learning, model evaluation, and predictive analytics in the healthcare domain. The Heart Patient Data Project demonstrates my ability to work with complex datasets and derive actionable insights, making a tangible impact in the prediction and diagnosis of heart disease.

About

Used R to visualize and analyze a dataset of heart patients with many predictor variables. Evaluated several machine learning models, such as logistic, linear discriminant analysis, quadratic analysis model, and K-nearest neighbors model, to find best fit for the data. Used random forests for tuning and cross validation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors