This repository contains my data science portfolio projects, implemented primarily in Python.
End-to-end ML projects in Python including standard ML workflows with an emphasis on comparing multiple algorithms, evaluating tradeoffs, and model interpretability.
- Predicting Credit Card Approvals: Predict credit card application approvals based on anonymized data using logistic regression, KNN, and random forest models. Compared model performance across three algorithms, applying feature scaling, categorical encoding, hyperparameter tuning, and threshold optimization to improve classification performance.
Implementation of foundational ML algorithms in Python, including standard pre-processing, feature engineering, hyperparameter tuning, and model evaluation.
- Predicting Loan Defaults with Random Forest: Predict loan default likelihood based on financial data using random forest models. Engineered domaon-specific features to better capture credit risk and addressed class imbalance (16% default rate) with hyperparameter tuning, threshold optimization, and SMOTE resampling. Results highlight that the main challenge lies not in class imbalance but in the lack of class separability, reflecting realistic difficulties in credit risk modeling.
- Classifying anonymized data with KNN: Classified anonymized data using a KNN model, highlighting the impact of feature scaling and k-value tuning on model performance.
- Predicting Ad Clicks with Logistic Regression: Modelled ad-click likelihood based on demographic and behaviour information using a logistic regression model. Implemented feature engineering (including cyclical temporal feature mapping), multicollinearity reduction, threshold optimization, and model performance testing.
- Predicting Synthetic Credit Scores with Linear Regression: Modelled synthetic credit scores based on financial and demographic features using a linear regression model. Implemented feature engineering, correlation analysis, multicollinearity reduction, and statistical significance testing to evaluate feature importance and improve model interpretability.
Data analytics projects using SQL, Tableau, and Excel, focusing on data storytelling through dashboards and reports.
- Insurance Analysis: Developed an interactive Tableau dashboard to report and analyze 70K insurance claims to support marketing and budget decisions.
- Marketing Analysis: Analyzed 100K e-commerce sales records using SQL (Google BigQuery) and Excel to uncover trends in customer behaviour, reporting sales and marketing metrics using an interactive Tableau dashboard.
- TTC Delay Analysis: Cleaned and analyzed 40K subway delay records for 2022-2023 using SQL and Tableau, assessing YoY KPIs and delay causes and providing performance improvement recommendations.