Skip to content

mitraboga/CustomerChurnPredictor

Repository files navigation

πŸ‘¨πŸΌβ€πŸ’Ό Customer Churn Predictor πŸ“Š

Machine Learning Risk Intelligence + Executive Business Dashboard

Customer Churn Predictor Demo


πŸ“– Project Overview

Customer churn is one of the most critical problems in subscription businesses.
Losing customers directly impacts revenue, growth, and acquisition costs.

CustomerChurnPredictor is an end-to-end churn risk system that combines:

  • βœ… Machine Learning prediction (probability scoring)
  • βœ… Risk segmentation (buckets + deciles)
  • βœ… ROI-based decisioning (threshold β†’ business value)
  • βœ… FastAPI inference service (single + batch prediction)
  • βœ… Streamlit modern SaaS dashboard (interactive demo + analytics)
  • βœ… Monitoring with Evidently (data drift report)
  • βœ… Tableau dashboards for executive stakeholders

The solution is designed to answer:

β€œWho is most likely to churn next, and what should we do about it?”


🎯 What Makes This Project Portfolio-Grade

Most churn projects stop at accuracy. This project goes further:

  • Probability β†’ Decision Policy: Intervene only above a chosen threshold
  • Threshold is ROI-driven: We simulate expected value across thresholds (not random 0.50 defaults)
  • Explainability built-in: SHAP + permutation importance
  • Deployed system: API + UI + logs + monitoring
  • Executive dashboards: Tableau-ready exports + dashboard suite

🧱 System Architecture

System Architecture


πŸ—‚οΈ Repository File Structure

CustomerChurnPredictor/
β”œβ”€ churn/                      # Core ML + API + monitoring modules
β”‚  β”œβ”€ data.py                  # Download + clean dataset
β”‚  β”œβ”€ modeling.py              # Preprocess + candidate models
β”‚  β”œβ”€ train.py                 # Train + save best model
β”‚  β”œβ”€ evaluate.py              # Metrics + confusion matrix + threshold scan
β”‚  β”œβ”€ explain.py               # Permutation + SHAP explainability
β”‚  β”œβ”€ business.py              # ROI simulation + best threshold
β”‚  β”œβ”€ tableau_export.py        # Exports final Tableau-ready CSVs
β”‚  β”œβ”€ api.py                   # FastAPI inference service + logging
β”‚  β”œβ”€ monitor.py               # Evidently drift report
β”‚  └─ config.py                # Paths, columns, business defaults
β”‚
β”œβ”€ app/
β”‚  └─ streamlit_app.py         # Modern SaaS Streamlit UI
β”‚
β”œβ”€ data/                       # Local-only data (ignored in git except placeholder)
β”‚  β”œβ”€ raw/
β”‚  β”œβ”€ processed/
β”‚  β”œβ”€ tableau/                 # Exports for Tableau dashboards
β”‚  └─ logs/                    # API prediction logs
β”‚
β”œβ”€ models/                     # Saved model artifact (model.joblib) + metadata
β”œβ”€ reports/
β”‚  β”œβ”€ figures/                 # Explainability + ROI plots (PNG/CSV)
β”‚  β”œβ”€ metrics/                 # Model metrics, threshold scan, best threshold
β”‚  └─ monitoring/              # Drift report HTML
β”‚
β”œβ”€ tableau/                    # Tableau workbook (.twbx) + screenshots
β”‚  └─ screenshots/
β”‚
β”œβ”€ tests/                      # Basic CI tests
β”œβ”€ requirements.txt
β”œβ”€ requirements-dev.txt
β”œβ”€ Makefile
└─ README.md

🧠 Churn Prediction Pipeline

Raw Telco Dataset
        ↓
Data Cleaning & Feature Engineering (Pandas)
        ↓
Preprocessing Pipeline (Impute + Scale + OneHotEncode)
        ↓
Classification Models (LogReg, RF)
        ↓
Churn Probability Scores (0–1)
        ↓
Risk Segmentation (Buckets + Deciles)
        ↓
ROI-Based Threshold Decisioning
        ↓
FastAPI + Streamlit + Monitoring + Tableau Dashboards

πŸ“Š Tableau Dashboards (Two-Dashboard Workflow)

You requested a 2-dashboard workflow:

  • Dashboard 1: Churn Overview (Executive / Business Analysis)
  • Dashboard 2: ML Risk Intelligence (Predictive + Decision Layer)

Dashboard 1 β€” Churn Overview (Business Analysis)

Tableau - Churn Overview Dashboard

Focus: historical churn patterns and segmentation insights.

KPIs

  • Total Customers
  • Churn Rate
  • Avg Monthly Charges
  • Avg Tenure

Visuals

  • Churn by Contract Type
  • Churn by Internet Service
  • Churn by Payment Method
  • Churn by Tenure Bucket
  • Interactive filters (Contract, InternetService, PaymentMethod, SeniorCitizen)

Dashboard 2 β€” ML Risk Intelligence (Predictive)

Tableau - ML Risk Intelligence Dashboard

Focus: who will churn next and what to do.

KPIs

  • Avg Churn Probability
  • High Risk Count (β‰₯ threshold)
  • Targeted Customers (decision policy)

Visuals

  • Churn probability distribution
  • Risk decile breakdown (Top 10% = Decile 10)
  • High-risk customers table
  • ROI threshold curve (Total Expected Value vs Threshold)

πŸ“¦ Tableau Data Files (Ready to Connect)

After running:

python -m churn.tableau_export

Tableau-ready exports are generated in:

  • data/tableau/telco_cleaned.csv
  • data/tableau/telco_scored.csv
  • data/tableau/roi_thresholds.csv
  • data/tableau/feature_importance.csv
  • data/tableau/threshold_scan.csv

πŸ–₯️ Streamlit SaaS Dashboard

The Streamlit app includes:

  • KPI cards (model performance + threshold)
  • Predict tab (decision + EV per customer)
  • Analytics dashboard tab (stacked distributions + drivers + threshold tradeoffs)
  • Explainability tab (Permutation + SHAP global/local)
  • Business tab (ROI curve + threshold strategy)
  • Logs & batch scoring tab (CSV upload + /predict_batch)
▢️ Predict Page

Streamlit - Predict

▢️ Analytics Dashboard (Part 1)

Streamlit - Analytics Dashboard Part 1

▢️ Analytics Dashboard (Part 2)

Streamlit - Analytics Dashboard Part 2

▢️ Explainability Page

Streamlit - Explainability

▢️ Business Page

Streamlit - Business

▢️ Logs & Batch Page

Streamlit - Logs & Batch


🌐 FastAPI Inference Service

Endpoints:

  • GET /health β€” service status
  • POST /predict β€” score one customer
  • POST /predict_batch β€” score many rows (batch scoring)

All predictions are logged to:

  • data/logs/predictions_log.csv

This log is used for monitoring drift.


⚑ FastAPI + Local Model (How It Works)

In this project, FastAPI acts as the bridge between the trained machine learning model and the user interface.

πŸ”— Local Development Setup (Project Mode)

During development, the system runs in two parts:

  1. FastAPI Backend (Model Server)

    • Loads the trained model (model.joblib)
    • Exposes prediction endpoints:
      • /predict β†’ single customer
      • /predict_batch β†’ multiple customers
    • Handles inference logic and logging
  2. Streamlit Frontend (Dashboard UI)

    • Collects user input (customer data)
    • Sends requests to FastAPI
    • Displays:
      • churn probability
      • decision (intervene or not)
      • expected business value

πŸ‘‰ Flow:

User Input (Streamlit)
        ↓
HTTP Request β†’ FastAPI (/predict)
        ↓
Model (joblib) β†’ Prediction
        ↓
Response β†’ Streamlit UI

This setup mimics a real production ML system, where:

  • UI β‰  Model
  • Communication happens via APIs

πŸš€ Why FastAPI Is Used

FastAPI is chosen because it is:

  • ⚑ Fast and lightweight (high-performance inference)
  • πŸ“¦ Production-ready (used in real ML systems)
  • πŸ”Œ Easy to integrate with frontends (Streamlit, React, etc.)
  • πŸ“Š Supports batch inference and scalability

🌍 Production Deployment (Real-World System)

In a real production environment, this system would be deployed as:

πŸ—οΈ Production Architecture

  • FastAPI β†’ deployed on cloud (AWS / GCP / Azure)
  • Model β†’ stored in object storage (S3 / GCS)
  • Load balancer β†’ handles traffic
  • Database β†’ stores prediction logs
  • Frontend β†’ separate app (React / dashboard)

Example Flow:

User β†’ Web App
        ↓
API Gateway / Load Balancer
        ↓
FastAPI Service (Docker container)
        ↓
Model Inference
        ↓
Response + Logging (Database)

πŸ”§ Deployment Tools (Industry Level)

  • Docker (containerization)
  • Kubernetes (scaling)
  • AWS ECS / Lambda / EC2
  • CI/CD pipelines (GitHub Actions)

πŸ’‘ Why This Project Uses a Simpler Approach

Since this is an academic + portfolio project, we use a simplified setup:

  • FastAPI runs locally (http://localhost:8000)
  • Streamlit connects directly to it
  • No cloud infrastructure required
  • No cost involved

This allows:

  • βœ… Fast development
  • βœ… Easy debugging
  • βœ… Zero deployment cost
  • βœ… Demonstrates full ML system design

🧠 Smart Hybrid Design (Cloud + Local Fallback)

The project also supports a fallback mode:

  • If FastAPI is offline, Streamlit:
    • loads model.joblib directly
    • performs predictions locally

This ensures:

  • 🚫 No dependency on backend uptime
  • 🌐 Works on Streamlit Cloud
  • πŸ’Ό Demonstrates resilient system design

🎯 Why This Matters

This architecture shows that the project is not just:

❌ β€œa machine learning model”

It is:

βœ… a complete ML system with deployment, APIs, UI, and monitoring


πŸ’¬ Summary of API framework

β€œI deployed my churn model behind a FastAPI service, which the Streamlit dashboard calls in real-time. I also implemented a local fallback so the system works even without a backendβ€”making it both production-ready and deployable for free.”


πŸ“ˆ Monitoring (Evidently Drift Report)

Run:

python -m churn.monitor

Output:

  • reports/monitoring/data_drift_report.html

This compares:

  • reference sample (saved during training)
  • current inference logs (from API)

πŸš€ How to Run (End-to-End)

1) Setup

python -m venv .venv
# Windows:
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
pip install -r requirements-dev.txt

2) Build the full pipeline

python -m churn.data --download
python -m churn.train
python -m churn.evaluate
python -m churn.explain
python -m churn.business
python -m churn.tableau_export

3) Run API + UI

Terminal A:

python -m churn.api

Terminal B:

python -m streamlit run app/streamlit_app.py

πŸ’‘ Key Insights (Examples)

  • Month-to-month contracts are consistently the highest churn risk
  • Long-term contracts (1–2 year) strongly reduce churn likelihood
  • Churn risk is concentrated: a smaller segment can represent a large share of revenue exposure
  • Probability-based segmentation enables targeted retention strategies instead of broad campaigns

πŸ“ˆ Potential Business Applications

Companies can use this system to:

  • identify high-risk customers early
  • deploy targeted retention campaigns
  • improve contract conversion strategies
  • protect recurring revenue with ROI-optimized decisions

πŸ‘₯ Authors

Mitra Boga

Yashweer Potelu

Datla Akshith Varma

Pranav Surya

About

CustomerChurnPredictor is an end-to-end churn system using the Telco dataset: scikit-learn modeling with proper evaluation, SHAP-based explainability, ROI-driven threshold decisioning, FastAPI + Streamlit deployment with logging/monitoring, and Tableau dashboards built from exported risk-scored data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors