Brewlytics: Customer Analytics & Offer Completion Prediction Leveraging GenAI for Personalized Marketing Strategy Recommendations
Brewlytics is a comprehensive data science project that analyzes a cafe's customer behavior and builds machine learning models to predict offer completion rates. The project demonstrates the complete data science pipeline from exploratory data analysis to model deployment and interpretation. An unique feature of this project is the customer segmentation done using K-Means clustering and leveraging LLM to provide personalized recommendations.
Predict which customers are likely to complete promotional offers to optimize marketing strategies, improve customer engagement, and maximize ROI on marketing strategies generated by Brewlytics.
Brewlytics_Chat/
βββ Notebooks/ # Jupyter Notebooks (01-06)
β βββ 01_EDA.ipynb # Exploratory Data Analysis
β βββ 02_Feature_Engg.ipynb # Feature Engineering
β βββ 03_Modeling.ipynb # Machine Learning Models
β βββ 04_PCA.ipynb # Dimensionality Reduction
β βββ 05_SHAP.ipynb # Model Explainability
β βββ 06_Customer_Segmentation.ipynb # Customer Segmentation
β βββ 07_Bias_Fairness_Analysis.ipynb # Bias & Fairness Analysis
β
βββ brewlytics_app/ # Streamlit Web App
β βββ app.py # Streamlit application
β βββ Dockerfile # Docker configuration
β βββ docker-compose.yml # Docker Compose setup
β βββ README.md # Web app documentation
β βββ Cafe_Rewards_Offers/ # Models, Dataset & Processed Data (to be downloaded from GDrive: https://drive.google.com/drive/folders/1-yqeoiDgAwDu6nJAkmwkiU45giE1YTzV?usp=sharing)
β βββ customers.csv # Customer demographics
β βββ offers.csv # Promotional offer details
β βββ events.csv # Transaction events
β βββ data_dictionary.csv # Data dictionary
β βββ customer_behavior_analysis.csv # Customer behavior analysis results
β βββ processed_data_for_classification.csv # Processed data for classification
β βββ processed/ # Processed datasets for ML
β βββ models/ # Trained machine learning models
β βββ pca_models/ # Trained PCA models
β βββ processed/ # Processed datasets for prediction models
β βββ segmentation/ # Trained segmentation models
β
βββ Data_Viz/ # Data Visualizations
β
βββ Prediction_and_GenAI_Results/ # Prediction and GenAI Results in csv format
β
βββ Presentation_Slides/ # Presentation slides for both technical and business audiences
β
βββ requirements.txt # Python dependencies
βββ .gitignore # Git ignore file
βββ README.md # This file
The project uses three main datasets:
- Customers: 17,000 records with demographic information (age, gender, income, membership details)
- Offers: 10 types of promotional offers (bogo, discount, informational) with varying difficulty and duration
- Events: 306,534 transaction events tracking offer reception, viewing, and completion
Due to their size, the datasets and trained models are not included in this repository. Download them from:
[Google Drive Link - Insert Here]
After downloading, extract and place the files in:
Cafe_Rewards_Offers/
βββ customers.csv
βββ offers.csv
βββ events.csv
βββ models/
β βββ random_forest.pkl
βββ processed/
β βββ scaler.pkl
β βββ feature_names.pkl
βββ segmentation/
βββ kmeans_model.pkl
- Gender Distribution: 57.2% male, 41.3% female, 1.4% other
- Age Profile: Middle-aged customers (40-65 years) form the core demographic
- Income Distribution: Middle-class customers ($50K-$75K) with multimodal patterns
- Membership Growth: Explosive growth in 2017 (6,500+ new members), indicating successful market penetration
- Data Quality: 12.8% of customers missing demographics (MNAR - Missing Not At Random)
- Data Leakage Removal: Critical step removing
offer_completedandoffer_viewedfeatures - Final Feature Set: 24 clean features available at prediction time
- Encoding Strategy:
- One-hot encoding for nominal variables (offer_type, gender)
- Ordinal encoding for ordered variables (age_group, income_bracket, tenure_group)
- Missing Value Handling: Median imputation for 0.13% missing values
- Scaling: StandardScaler applied to 11 numerical features
- Best Model: Random Forest (F1 = 0.8601, AUC-ROC = 0.9277)
- Model Comparison:
- Logistic Regression: F1 = 0.8240 (baseline)
- Decision Tree: F1 = 0.8263
- Random Forest: F1 = 0.8601 (winner)
- XGBoost: F1 = 0.8515
- Key Insight: Ensemble methods significantly outperform simple baselines
- PCA Analysis: 8 components capture 90% variance with 67% feature reduction
- Performance Trade-off: Only 1.30% F1 drop for 67% fewer features
- Recommendation: Use 8-component PCA model for production (efficiency gains)
- Feature Contributions:
- PC1: Customer tenure (membership_year, duration_days)
- PC2: Demographics (age, income)
- PC3: Offer characteristics (duration, difficulty)
- SHAP Analysis: Revealed actual feature importance and directional impacts
- Top Features:
offer_type_discount(21.41% importance) - Strongest completion driverduration(14.16% importance) - Shorter duration = higher completiondifficulty(9.27% importance) - Easier offers = higher completion
- Key Insight: Offer design matters more than customer demographics (52% vs 34% importance)
- Optimal Segments: 5 distinct customer clusters identified
- Segment Profiles:
- New Male Members (30.8%): Age 52, $60K income, 0.7 years tenure, 44.6% completion
- Affluent Female Members (36.2%): Age 58, $72K income, 1.3 years tenure, 65.9% completion
- Missing Demographics (11.5%): Data quality issue, 15.7% completion
- Small Engaged Segment (1.2%): High engagement, 63.2% completion
- Long-Tenure Male Members (20.2%): Age 53, $63K income, 2.9 years tenure, 65.3% completion
- Raw Data β Quality Checks β Feature Engineering β Model Training β Evaluation β Interpretation
- Data Processing: Pandas, NumPy, Scikit-learn
- Machine Learning: Random Forest, XGBoost, Logistic Regression
- Visualization: Matplotlib, Seaborn, Plotly
- Interpretability: SHAP, PCA
- Clustering: K-means
- Accuracy: 84.54%
- Precision: 83.18%
- Recall: 89.05%
- F1-Score: 86.01%
- AUC-ROC: 92.77%
- Increase discount offers - 21.41% importance, strongest completion driver
- Reduce offer duration - Negative impact when too long
- Lower difficulty thresholds - Easier offers perform better
- Primary Focus: Affluent female members (36.2% of base, 65.9% completion)
- Secondary Focus: Long-tenure male members (20.2% of base, 65.3% completion)
- Growth Opportunity: New male members (30.8% of base, need engagement boost)
- Prioritize channels with highest engagement rates
- Focus on improving offer visibility (view rate strongly correlates with completion)
- Critical: Fix missing demographics collection (11.5% of customers)
- Implement better onboarding data capture
- Regular data quality audits
membership_duration_days(17.35%)income(13.08%)age(11.34%)offer_type_informational(8.77%)difficulty(8.37%)duration(8.04%)received_time(5.52%)membership_month(5.48%)income_bracket_encoded(3.27%)age_group_encoded(3.08%)
- Positive for Completion: Discount offers, BOGO offers, higher income
- Negative for Completion: Longer duration, higher difficulty, informational offers
A production-ready Streamlit web application has been deployed to predict offer completion rates in real-time. The app is fully containerized with Docker for easy deployment.
- File Upload Interface: Upload CSV files with customer and offer data
- Real-time Predictions: Generate predictions using the trained Random Forest model
- Interactive Dashboard: View prediction statistics, probability distributions, and top customers
- Download Results: Export predictions, with completion probabilities and tailored marketing strategies to CSV
- Dynamic Marketing Strategy Generator: Generates personalized marketing strategies based on customer profile and % prediction
- Containerized: Fully Dockerized for consistent deployment
cd brewlytics_app
docker-compose upThe app will be available at http://localhost:8501
Note: The app requires the trained model and data files. Download them from the Google Drive link above before running.
cd brewlytics_app
# Build the image
docker build -t brewlytics-app .
# Run the container
docker run -p 8501:8501 -v $(pwd)/../Cafe_Rewards_Offers:/app/Cafe_Rewards_Offers brewlytics-appcd brewlytics_app
pip install -r requirements.txt
streamlit run app.pyAccess the app at http://localhost:8501
Your uploaded CSV should contain these columns:
Offer Details: received_time, difficulty, duration, offer_type
Marketing Channels: in_email, in_mobile, in_social, in_web, offer_received
Customer Demographics: age, income, gender, age_group, income_bracket
Membership Information: membership_year, membership_duration_days, membership_month, tenure_group
Flags: is_demographics_missing
The app generates predictions with:
prediction- Binary prediction (0: Will Not Complete, 1: Will Complete)prediction_label- Human-readable labelcompletion_probability- Probability of completing (0-1)non_completion_probability- Probability of not completing (0-1)
For detailed documentation, see brewlytics_app/README.md
- Deploy Random Forest model with 24 features
- Predict completion probability at offer delivery
- Route high-probability offers to appropriate customers
- Use 5-cluster segmentation for targeted campaigns
- Tailor offer types and messaging to each segment
- Monitor segment performance over time
- Use SHAP insights to design better offers
- A/B test new offer configurations
- Optimize difficulty and duration parameters
- Leverage membership duration and engagement patterns
- Identify high-value customers for retention programs
- Predict churn risk based on behavior changes
- Full Feature Model: Maximum accuracy (F1 = 0.8601)
- PCA Reduced Model: 67% faster with minimal accuracy loss (F1 = 0.8472)
- Feature drift detection (monthly)
- Performance degradation alerts
- Segment size changes
- Data quality metrics
This project demonstrates:
- Complete ML Pipeline: From raw data to production-ready models
- Advanced Techniques: PCA, SHAP, clustering, ensemble methods
- Business Acumen: Translating technical insights into actionable recommendations
- Data Quality Management: Handling missing data and preventing leakage
- Model Interpretability: Explaining black-box models for stakeholder trust
- Download required files from Google Drive (see Dataset Overview section)
- Place
Cafe_Rewards_Offers/folder in the project root - Run the app:
cd brewlytics_app
docker-compose upThen navigate to http://localhost:8501 and upload your CSV file to get predictions.
- Download required files from Google Drive (see Dataset Overview section)
- Place
Cafe_Rewards_Offers/folder in the project root
pip install pandas numpy scikit-learn matplotlib seaborn shap xgboost jupyter- Run
01_EDA.ipynb- Understand the data - Run
02_Feature_Engg.ipynb- Prepare features for modeling - Run
03_Modeling.ipynb- Train and evaluate models - Run
04_PCA.ipynb- Optimize feature space - Run
05_SHAP.ipynb- Understand model decisions - Run
06_Customer_Segmentation.ipynb- Discover customer segments - Run
07_Bias_Fairness_Analysis.ipynb- Analyze model fairness
Ensure the following files are in Cafe_Rewards_Offers/:
customers.csvoffers.csvevents.csv
This project is for educational purposes. Please ensure compliance with data usage policies and regulations.
Project Status: β Complete & Deployed (All notebooks functional, web app deployed via Docker) Last Updated: January 25 2026 Total Analysis: 7 comprehensive notebooks covering end-to-end data science pipeline Deployment: Containerized Streamlit web application with Docker