Customer Churn Prediction Project

Overview

This project leverages machine learning to predict customer churn in the banking sector. By identifying at-risk customers, we aim to help banking professionals take proactive measures, retain customers, and improve long-term customer relationships. The model not only predicts churn but also provides insights into customer risk factors and can be integrated into personalized retention strategies, such as tailored incentives.

Key Features

Churn Prediction: Uses advanced ML models to predict the likelihood of customer churn.
Risk Factor Analysis: Highlights reasons for each customer’s risk of attrition.
Personalized Retention Campaigns: Generates custom email incentives based on customer risk factors.
Fraud Detection: Employs ML models to flag potential fraudulent transactions.

_ Table of Contents _

Data Preprocessing
Model Training and Evaluation
Acknowledgments

Project Structure

Data Preprocessing

This project uses a dataset from Kaggle (linked in Resources) that contains 14 features. Key preprocessing steps include:

Feature Selection: Dropping irrelevant columns such as CustomerId and Surname. Encoding Categorical Variables: Converting Geography and Gender columns to numerical using one-hot encoding.
Data Scaling: Standardizing features to improve model performance.

Model Training and Evaluation

The following models were trained and evaluated on the processed dataset:

XGBoost
Gradient Boosting
Random Forest
Decision Tree
Naive Bayes
Support Vector Classifier (SVC)

Each model was evaluated using accuracy, precision, recall, and F1-score, with a focus on recall to capture at-risk customers.

Here are the performances for each of our models:

For Fraud detection here how the best version of the model performed

Balancing Techniques

Given the imbalance in churn vs. non-churn customers, SMOTE (Synthetic Minority Oversampling Technique) was used to balance the dataset, enhancing recall performance.

Acknowledgments

Thanks to the Headstarter team for their guidance and feedback.A special thanks to Faizan (Co-founder) for his support and mentorship.

Dataset: Kaggle - Churn for Bank Customers Preprocessing insights: LakeFS Blog on Data Preprocessing

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.devcontainer		.devcontainer
.streamlit		.streamlit
imgs		imgs
.DS_Store		.DS_Store
churn.csv		churn.csv
fraud_data.csv		fraud_data.csv
fraud_model.pkl		fraud_model.pkl
gb_model-SMOTE.pkl		gb_model-SMOTE.pkl
main.py		main.py
readme.md		readme.md
requirements.txt		requirements.txt
rf_model-SMOTE.pkl		rf_model-SMOTE.pkl
utils.py		utils.py
voting_clf_model.pkl		voting_clf_model.pkl
xgboost_model-SMOTE.pkl		xgboost_model-SMOTE.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Prediction Project

Overview

Key Features

Data Preprocessing

Model Training and Evaluation

Balancing Techniques

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Prediction Project

Overview

Key Features

Data Preprocessing

Model Training and Evaluation

Balancing Techniques

Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages