The Real-world Heart Patient Data Project was a significant undertaking where I leveraged my skills in data analytics and machine learning to analyze a comprehensive dataset related to heart patients. The dataset included a multitude of predictor variables aimed at predicting whether an individual had heart disease or not.
- Initial Exploration: Conducted an in-depth exploration of the dataset, examining variable distributions, identifying outliers, and assessing data quality.
- Preprocessing: Loaded necessary libraries, checked dataset dimensions, and handled any missing data.
- Exploratory Data Analysis (EDA): Focused on the predictor variables most relevant to the target variable, exploring their relationships with each other and with the target variable. This step provided initial insights and allowed for inferences, particularly regarding variable correlations.
- Training and Testing Split: Split the data into training and testing sets to facilitate model validation.
- Recipes: Developed and applied data preprocessing recipes.
- Stratified Sampling: Ensured balanced representation in the training and testing datasets.
- Correlation Analysis: Explored correlations between continuous variables and the target variable to inform feature selection.
- Machine Learning Models: Developed multiple predictive models including:
- Logistic Regression
- Decision Trees
- Random Forests
- K-Nearest Neighbors (KNN)
- Linear Discriminant Analysis (LDA)
- Quadratic Discriminant Analysis (QDA)
- Model Tuning: Enhanced model accuracy using hyperparameter optimization.
- Cross-Validation: Employed techniques like cross-validation with random forests, ranger, and XGBoost engines, fitting models to folded data for robust evaluation.
- Performance Metrics: Evaluated model performance based on metrics such as the area under the ROC curve (AUC) for both training and testing datasets.
- Heatmaps: Used heatmaps to visualize model performance and understand predictor relationships.
- Key Predictors: Identified significant predictors of heart disease from the model outcomes.
- Actionable Recommendations: Provided recommendations based on the insights generated from the analysis.
- Conclusion: Summarized the findings and insights, offering potential real-world applications of the model for heart disease prediction.
This project showcased my proficiency in machine learning, model evaluation, and predictive analytics in the healthcare domain. The Heart Patient Data Project demonstrates my ability to work with complex datasets and derive actionable insights, making a tangible impact in the prediction and diagnosis of heart disease.