Predicting customer attrition in the telecommunications industry using ensemble learning. This project focuses on identifying the key drivers of churn to improve business retention strategies.
🔗 Original Notebook on Kaggle: View here
- Exploratory Data Analysis (EDA): Statistical analysis using Chi-square tests and correlation matrices to select features.
- Data Cleaning: Smart removal of redundant features (like
genderandTotalCharges) to simplify the model. - Feature Engineering: - Manual Ordinal Encoding for contract types (
Month-to-month,One year,Two year).- One-Hot Encoding for categorical services.
- Standard Scaling for numerical variables (
tenure,MonthlyCharges).
- Modeling: Implementation of a Random Forest Classifier with
class_weight='balanced'to handle dataset imbalance without oversampling.
- Contract Type: Monthly contracts are the highest indicator of potential churn.
- Tenure: Customer loyalty (length of stay) inversely correlates with the probability of leaving.
- Clone the repo:
git clone https://github.com/lucalullo/customer-churn-prediction.git - Install dependencies:
pip install pandas numpy seaborn matplotlib scikit-learn - Run the
customer-churn.ipynbnotebook.
Author: Luca Lullo Data Scientist | Machine Learning Applied