Skip to content

lucalullo/Customer-churn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Customer Churn

Python Scikit-Learn Random Forest

Predicting customer attrition in the telecommunications industry using ensemble learning. This project focuses on identifying the key drivers of churn to improve business retention strategies.

🔗 Original Notebook on Kaggle: View here

🛠️ Technical Workflow

  • Exploratory Data Analysis (EDA): Statistical analysis using Chi-square tests and correlation matrices to select features.
  • Data Cleaning: Smart removal of redundant features (like gender and TotalCharges) to simplify the model.
  • Feature Engineering: - Manual Ordinal Encoding for contract types (Month-to-month, One year, Two year).
    • One-Hot Encoding for categorical services.
    • Standard Scaling for numerical variables (tenure, MonthlyCharges).
  • Modeling: Implementation of a Random Forest Classifier with class_weight='balanced' to handle dataset imbalance without oversampling.

📈 Key Insights

  • Contract Type: Monthly contracts are the highest indicator of potential churn.
  • Tenure: Customer loyalty (length of stay) inversely correlates with the probability of leaving.

🚀 How to use

  1. Clone the repo: git clone https://github.com/lucalullo/customer-churn-prediction.git
  2. Install dependencies: pip install pandas numpy seaborn matplotlib scikit-learn
  3. Run the customer-churn.ipynb notebook.

Author: Luca Lullo Data Scientist | Machine Learning Applied

About

Customer churn prediction using Random Forest and class-weight balancing. Detailed EDA and feature engineering on telecom industry data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors