Skip to content

techmum21p/Brewlytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

46 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Brewlytics: Customer Analytics & Offer Completion Prediction Leveraging GenAI for Personalized Marketing Strategy Recommendations

πŸ“Š Project Overview

Brewlytics is a comprehensive data science project that analyzes a cafe's customer behavior and builds machine learning models to predict offer completion rates. The project demonstrates the complete data science pipeline from exploratory data analysis to model deployment and interpretation. An unique feature of this project is the customer segmentation done using K-Means clustering and leveraging LLM to provide personalized recommendations.

🎯 Business Objective

Predict which customers are likely to complete promotional offers to optimize marketing strategies, improve customer engagement, and maximize ROI on marketing strategies generated by Brewlytics.

πŸ“ Project Structure

Brewlytics_Chat/
β”œβ”€β”€ Notebooks/                    # Jupyter Notebooks (01-06)
β”‚   β”œβ”€β”€ 01_EDA.ipynb            # Exploratory Data Analysis
β”‚   β”œβ”€β”€ 02_Feature_Engg.ipynb    # Feature Engineering
β”‚   β”œβ”€β”€ 03_Modeling.ipynb       # Machine Learning Models
β”‚   β”œβ”€β”€ 04_PCA.ipynb            # Dimensionality Reduction
β”‚   β”œβ”€β”€ 05_SHAP.ipynb           # Model Explainability
β”‚   β”œβ”€β”€ 06_Customer_Segmentation.ipynb # Customer Segmentation
β”‚   └── 07_Bias_Fairness_Analysis.ipynb # Bias & Fairness Analysis
β”‚   
β”œβ”€β”€ brewlytics_app/             # Streamlit Web App
β”‚   β”œβ”€β”€ app.py                 # Streamlit application
β”‚   β”œβ”€β”€ Dockerfile              # Docker configuration
β”‚   β”œβ”€β”€ docker-compose.yml      # Docker Compose setup
β”‚   β”œβ”€β”€ README.md               # Web app documentation
β”‚   β”œβ”€β”€ Cafe_Rewards_Offers/    # Models, Dataset & Processed Data (to be downloaded from GDrive: https://drive.google.com/drive/folders/1-yqeoiDgAwDu6nJAkmwkiU45giE1YTzV?usp=sharing)
β”‚      β”œβ”€β”€ customers.csv        # Customer demographics
β”‚      β”œβ”€β”€ offers.csv           # Promotional offer details
β”‚      β”œβ”€β”€ events.csv           # Transaction events
β”‚      β”œβ”€β”€ data_dictionary.csv  # Data dictionary
β”‚      β”œβ”€β”€ customer_behavior_analysis.csv # Customer behavior analysis results
β”‚      β”œβ”€β”€ processed_data_for_classification.csv # Processed data for classification
β”‚      β”œβ”€β”€ processed/           # Processed datasets for ML
β”‚      β”œβ”€β”€ models/              # Trained machine learning models
β”‚      β”œβ”€β”€ pca_models/          # Trained PCA models
β”‚      β”œβ”€β”€ processed/           # Processed datasets for prediction models
β”‚      β”œβ”€β”€ segmentation/        # Trained segmentation models
β”‚   
β”œβ”€β”€ Data_Viz/                   # Data Visualizations
β”‚   
β”œβ”€β”€ Prediction_and_GenAI_Results/   # Prediction and GenAI Results in csv format
β”‚   
β”œβ”€β”€ Presentation_Slides/         # Presentation slides for both technical and business audiences
β”‚   
β”œβ”€β”€ requirements.txt             # Python dependencies
β”œβ”€β”€ .gitignore                   # Git ignore file
└── README.md                    # This file

πŸ“ˆ Dataset Overview

The project uses three main datasets:

  • Customers: 17,000 records with demographic information (age, gender, income, membership details)
  • Offers: 10 types of promotional offers (bogo, discount, informational) with varying difficulty and duration
  • Events: 306,534 transaction events tracking offer reception, viewing, and completion

πŸ“₯ Downloading Required Files

Due to their size, the datasets and trained models are not included in this repository. Download them from:

[Google Drive Link - Insert Here]

After downloading, extract and place the files in:

Cafe_Rewards_Offers/
β”œβ”€β”€ customers.csv
β”œβ”€β”€ offers.csv
β”œβ”€β”€ events.csv
β”œβ”€β”€ models/
β”‚   └── random_forest.pkl
β”œβ”€β”€ processed/
β”‚   β”œβ”€β”€ scaler.pkl
β”‚   └── feature_names.pkl
└── segmentation/
    └── kmeans_model.pkl

πŸ” Key Findings from Analysis

1. Customer Demographics (Notebook 01)

  • Gender Distribution: 57.2% male, 41.3% female, 1.4% other
  • Age Profile: Middle-aged customers (40-65 years) form the core demographic
  • Income Distribution: Middle-class customers ($50K-$75K) with multimodal patterns
  • Membership Growth: Explosive growth in 2017 (6,500+ new members), indicating successful market penetration
  • Data Quality: 12.8% of customers missing demographics (MNAR - Missing Not At Random)

2. Feature Engineering (Notebook 02)

  • Data Leakage Removal: Critical step removing offer_completed and offer_viewed features
  • Final Feature Set: 24 clean features available at prediction time
  • Encoding Strategy:
    • One-hot encoding for nominal variables (offer_type, gender)
    • Ordinal encoding for ordered variables (age_group, income_bracket, tenure_group)
  • Missing Value Handling: Median imputation for 0.13% missing values
  • Scaling: StandardScaler applied to 11 numerical features

3. Machine Learning Models (Notebook 03)

  • Best Model: Random Forest (F1 = 0.8601, AUC-ROC = 0.9277)
  • Model Comparison:
    • Logistic Regression: F1 = 0.8240 (baseline)
    • Decision Tree: F1 = 0.8263
    • Random Forest: F1 = 0.8601 (winner)
    • XGBoost: F1 = 0.8515
  • Key Insight: Ensemble methods significantly outperform simple baselines

4. Dimensionality Reduction (Notebook 04)

  • PCA Analysis: 8 components capture 90% variance with 67% feature reduction
  • Performance Trade-off: Only 1.30% F1 drop for 67% fewer features
  • Recommendation: Use 8-component PCA model for production (efficiency gains)
  • Feature Contributions:
    • PC1: Customer tenure (membership_year, duration_days)
    • PC2: Demographics (age, income)
    • PC3: Offer characteristics (duration, difficulty)

5. Model Explainability (Notebook 05)

  • SHAP Analysis: Revealed actual feature importance and directional impacts
  • Top Features:
    1. offer_type_discount (21.41% importance) - Strongest completion driver
    2. duration (14.16% importance) - Shorter duration = higher completion
    3. difficulty (9.27% importance) - Easier offers = higher completion
  • Key Insight: Offer design matters more than customer demographics (52% vs 34% importance)

6. Customer Segmentation (Notebook 06)

  • Optimal Segments: 5 distinct customer clusters identified
  • Segment Profiles:
    1. New Male Members (30.8%): Age 52, $60K income, 0.7 years tenure, 44.6% completion
    2. Affluent Female Members (36.2%): Age 58, $72K income, 1.3 years tenure, 65.9% completion
    3. Missing Demographics (11.5%): Data quality issue, 15.7% completion
    4. Small Engaged Segment (1.2%): High engagement, 63.2% completion
    5. Long-Tenure Male Members (20.2%): Age 53, $63K income, 2.9 years tenure, 65.3% completion

πŸš€ Technical Implementation

Data Pipeline

  1. Raw Data β†’ Quality Checks β†’ Feature Engineering β†’ Model Training β†’ Evaluation β†’ Interpretation

Key Technologies

  • Data Processing: Pandas, NumPy, Scikit-learn
  • Machine Learning: Random Forest, XGBoost, Logistic Regression
  • Visualization: Matplotlib, Seaborn, Plotly
  • Interpretability: SHAP, PCA
  • Clustering: K-means

Model Performance

  • Accuracy: 84.54%
  • Precision: 83.18%
  • Recall: 89.05%
  • F1-Score: 86.01%
  • AUC-ROC: 92.77%

πŸ’‘ Business Recommendations

1. Offer Design Optimization (Highest Priority)

  • Increase discount offers - 21.41% importance, strongest completion driver
  • Reduce offer duration - Negative impact when too long
  • Lower difficulty thresholds - Easier offers perform better

2. Customer Targeting Strategy

  • Primary Focus: Affluent female members (36.2% of base, 65.9% completion)
  • Secondary Focus: Long-tenure male members (20.2% of base, 65.3% completion)
  • Growth Opportunity: New male members (30.8% of base, need engagement boost)

3. Channel Optimization

  • Prioritize channels with highest engagement rates
  • Focus on improving offer visibility (view rate strongly correlates with completion)

4. Data Quality Improvement

  • Critical: Fix missing demographics collection (11.5% of customers)
  • Implement better onboarding data capture
  • Regular data quality audits

πŸ“Š Model Insights

Feature Importance (Top 10)

  1. membership_duration_days (17.35%)
  2. income (13.08%)
  3. age (11.34%)
  4. offer_type_informational (8.77%)
  5. difficulty (8.37%)
  6. duration (8.04%)
  7. received_time (5.52%)
  8. membership_month (5.48%)
  9. income_bracket_encoded (3.27%)
  10. age_group_encoded (3.08%)

SHAP Directional Impacts

  • Positive for Completion: Discount offers, BOGO offers, higher income
  • Negative for Completion: Longer duration, higher difficulty, informational offers

πŸš€ Web Application & Docker Deployment

A production-ready Streamlit web application has been deployed to predict offer completion rates in real-time. The app is fully containerized with Docker for easy deployment.

Web Application Features

  • File Upload Interface: Upload CSV files with customer and offer data
  • Real-time Predictions: Generate predictions using the trained Random Forest model
  • Interactive Dashboard: View prediction statistics, probability distributions, and top customers
  • Download Results: Export predictions, with completion probabilities and tailored marketing strategies to CSV
  • Dynamic Marketing Strategy Generator: Generates personalized marketing strategies based on customer profile and % prediction
  • Containerized: Fully Dockerized for consistent deployment

Quick Start with Docker

Using Docker Compose (Recommended)

cd brewlytics_app
docker-compose up

The app will be available at http://localhost:8501

Note: The app requires the trained model and data files. Download them from the Google Drive link above before running.

Using Docker Build & Run

cd brewlytics_app

# Build the image
docker build -t brewlytics-app .

# Run the container
docker run -p 8501:8501 -v $(pwd)/../Cafe_Rewards_Offers:/app/Cafe_Rewards_Offers brewlytics-app

Running Locally (without Docker)

cd brewlytics_app
pip install -r requirements.txt
streamlit run app.py

Access the app at http://localhost:8501

Required CSV Format

Your uploaded CSV should contain these columns:

Offer Details: received_time, difficulty, duration, offer_type

Marketing Channels: in_email, in_mobile, in_social, in_web, offer_received

Customer Demographics: age, income, gender, age_group, income_bracket

Membership Information: membership_year, membership_duration_days, membership_month, tenure_group

Flags: is_demographics_missing

Output

The app generates predictions with:

  • prediction - Binary prediction (0: Will Not Complete, 1: Will Complete)
  • prediction_label - Human-readable label
  • completion_probability - Probability of completing (0-1)
  • non_completion_probability - Probability of not completing (0-1)

For detailed documentation, see brewlytics_app/README.md

🎯 Use Cases

1. Real-time Offer Recommendation

  • Deploy Random Forest model with 24 features
  • Predict completion probability at offer delivery
  • Route high-probability offers to appropriate customers

2. Customer Segmentation Marketing

  • Use 5-cluster segmentation for targeted campaigns
  • Tailor offer types and messaging to each segment
  • Monitor segment performance over time

3. Offer Design Optimization

  • Use SHAP insights to design better offers
  • A/B test new offer configurations
  • Optimize difficulty and duration parameters

4. Customer Lifetime Value Prediction

  • Leverage membership duration and engagement patterns
  • Identify high-value customers for retention programs
  • Predict churn risk based on behavior changes

πŸ”§ Model Deployment

Production Options

  1. Full Feature Model: Maximum accuracy (F1 = 0.8601)
  2. PCA Reduced Model: 67% faster with minimal accuracy loss (F1 = 0.8472)

Monitoring Requirements

  • Feature drift detection (monthly)
  • Performance degradation alerts
  • Segment size changes
  • Data quality metrics

πŸ“š Learning Outcomes

This project demonstrates:

  • Complete ML Pipeline: From raw data to production-ready models
  • Advanced Techniques: PCA, SHAP, clustering, ensemble methods
  • Business Acumen: Translating technical insights into actionable recommendations
  • Data Quality Management: Handling missing data and preventing leakage
  • Model Interpretability: Explaining black-box models for stakeholder trust

πŸ› οΈ How to Run

Option 1: Run the Web Application (Fastest)

  1. Download required files from Google Drive (see Dataset Overview section)
  2. Place Cafe_Rewards_Offers/ folder in the project root
  3. Run the app:
cd brewlytics_app
docker-compose up

Then navigate to http://localhost:8501 and upload your CSV file to get predictions.

Option 2: Run Jupyter Notebooks (Full Analysis)

  1. Download required files from Google Drive (see Dataset Overview section)
  2. Place Cafe_Rewards_Offers/ folder in the project root

Prerequisites

pip install pandas numpy scikit-learn matplotlib seaborn shap xgboost jupyter

Execution Order

  1. Run 01_EDA.ipynb - Understand the data
  2. Run 02_Feature_Engg.ipynb - Prepare features for modeling
  3. Run 03_Modeling.ipynb - Train and evaluate models
  4. Run 04_PCA.ipynb - Optimize feature space
  5. Run 05_SHAP.ipynb - Understand model decisions
  6. Run 06_Customer_Segmentation.ipynb - Discover customer segments
  7. Run 07_Bias_Fairness_Analysis.ipynb - Analyze model fairness

Data Files

Ensure the following files are in Cafe_Rewards_Offers/:

  • customers.csv
  • offers.csv
  • events.csv

πŸ“„ License

This project is for educational purposes. Please ensure compliance with data usage policies and regulations.


Project Status: βœ… Complete & Deployed (All notebooks functional, web app deployed via Docker) Last Updated: January 25 2026 Total Analysis: 7 comprehensive notebooks covering end-to-end data science pipeline Deployment: Containerized Streamlit web application with Docker

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages