🛒 Customer Segmentation Using Clustering

📘 Project Overview

This project performs customer segmentation for a supermarket chain to improve membership card conversion rates using unsupervised machine learning.
By identifying distinct customer groups based on demographics and spending behavior, marketing teams can craft personalized offers and improve engagement.

🎯 Problem Statement

Supermarkets face challenges in increasing membership card sign-ups because marketing efforts target all customers uniformly.
The goal is to segment customers into distinct groups to enable:

Personalized marketing
Better understanding of customer behavior
Improved conversion and retention rates

📂 Dataset Information

Source: Provided dataset (/data)
Records: 200 customers
Features:
- Age
- Gender (encoded)
- Annual Income
- Spending Score

Preprocessing Steps:

Handled missing values
Encoded categorical variables
Applied feature scaling (StandardScaler)
Split data into train and test sets

⚙️ Workflow Summary

Exploratory Data Analysis (EDA)
- Visualized feature distributions and correlations
- Generated heatmaps and pairplots
Feature Engineering
- Standardized numerical features
Model Training & Evaluation
- Trained multiple clustering algorithms:
  - KMeans
  - Gaussian Mixture Model (GMM)
  - Agglomerative Clustering
  - DBSCAN
Model Evaluation Metrics
- Silhouette Score
- Davies–Bouldin Index
- Calinski–Harabasz Score
Cluster Profiling
- Analyzed customer characteristics per cluster
- Exported profiles to CSV

🤖 Model Comparison

Model	Train Silhouette	Test Silhouette	After Tuning	Observations
KMeans	0.3627	0.4056	0.4280	✅ Stable and generalizes well
GMM	0.4224	0.3257	0.4285	⚠ Slight overfitting
Agglomerative	0.7229	—	0.7355	⚠ Overfitted to training data
DBSCAN	0.5161	—	0.5506	⚠ Sensitive to parameters

🧠 Final Model Recommendation — KMeans

Why KMeans?

Consistent performance on train and test sets
Tuned Silhouette Score: 0.4280
Simple, interpretable, and scalable
Capable of predicting new customer segments easily

Business Impact:

Enables targeted marketing for high-value clusters
Boosts membership card conversion
Supports automated segmentation for future data

📊 Key Visuals

Stored in out/images/:

Visualization	Description
`corr_heatmap.png`	Feature correlations
`eda_01.png` – `eda_03.png`	Exploratory analysis
`optimal_k_sil.png`	Optimal cluster (KMeans)
`model_comparision.png`	Model comparison summary
`cluster_before_tuning.png`, `cluster_after_tuning.png`	Clustering improvement

Cluster summaries available in:
out/files/cluster_profiles/

🧩 Saved Models & Artifacts

Located in out/models/:

File	Description
`KMeans_model.pkl`	Final selected model
`GMM_model.pkl`	Gaussian Mixture Model
`Agglomerative_model.pkl`	Hierarchical model
`DBSCAN_model.pkl`	Density-based model
`scaler.pkl`	Feature scaler

🧪 How to Run

Open the notebook:
```
retrain.ipynb
```
Execute all cells to:
- Load and preprocess data
- Train and evaluate clustering models
- Save outputs (images, CSVs, and models)

Outputs will automatically be saved in /out/files/ and /out/models/.

🗂️ Project Structure

project/
│
├── data/                          # Raw dataset
├── out/
│   ├── files/
│   │   └── cluster_profiles/      # Cluster CSV reports
│   ├── images/                    # Visualizations
│   └── models/                    # Trained models and scaler
│
├── retrain.ipynb                  # Main notebook
├── README.md                      # Project documentation
└── .gitattributes                 # Git configuration

🚀 Future Improvements

Apply PCA for dimensionality reduction before clustering
Add behavioral metrics (purchase frequency, product categories)
Deploy as an interactive dashboard or API for real-time segmentation

👤 Author

Name: Isula Dissanayake
Date: November 2025
Project: Customer Segmentation using Machine Learning

⭐ Final Takeaway:
KMeans provides the most interpretable, stable, and business-ready segmentation model for optimizing membership marketing strategies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛒 Customer Segmentation Using Clustering

📘 Project Overview

🎯 Problem Statement

📂 Dataset Information

⚙️ Workflow Summary

🤖 Model Comparison

🧠 Final Model Recommendation — KMeans

Why KMeans?

📊 Key Visuals

🧩 Saved Models & Artifacts

🧪 How to Run

🗂️ Project Structure

🚀 Future Improvements

👤 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
docs		docs
out		out
.gitattributes		.gitattributes
README.md		README.md
retrain.ipynb		retrain.ipynb

heyisula/customerseg_ml

Folders and files

Latest commit

History

Repository files navigation

🛒 Customer Segmentation Using Clustering

📘 Project Overview

🎯 Problem Statement

📂 Dataset Information

⚙️ Workflow Summary

🤖 Model Comparison

🧠 Final Model Recommendation — KMeans

Why KMeans?

📊 Key Visuals

🧩 Saved Models & Artifacts

🧪 How to Run

🗂️ Project Structure

🚀 Future Improvements

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages