Professional-grade data science analysis demonstrating advanced analytics, business intelligence, and strategic insights from real-world retail transaction data.
This repository contains a comprehensive end-to-end data analysis of supermarket sales transactions across multiple stores and product categories. The project demonstrates enterprise-level analytics capabilities including customer segmentation, time-series forecasting, profitability analysis, and actionable business recommendations.
Perfect for: Portfolio projects, data science interviews, business analytics roles, or as a template for retail analytics.
- RFM Segmentation - Identify high-value, at-risk, and dormant customers
- Time Series Analysis - Detect seasonality, trends, and patterns
- Pareto Analysis - Discover the vital 20% driving 80% of revenue
- Anomaly Detection - Find unusual transactions and operational issues
- Cross-sell Mapping - Identify product pairs frequently purchased together
- Profitability Analysis - Calculate margins and profit per transaction
- Executive-level KPI metrics (revenue, transactions, satisfaction)
- Store performance benchmarking and comparison
- Customer behavior and demographic analysis
- Product category ranking and growth analysis
- Seasonal patterns and forecasting
- Strategic recommendations with estimated ROI
- 6 dashboard-style PNG visualizations (300 DPI)
- Color-coded for easy interpretation
- Ready for executive presentations
- Automatically generated from data
- Clean, commented, well-organized Python
- Modular sections for easy modification
- Follows industry best practices
- Documented assumptions and methodologies
supermarket-analytics/
├── README.md # This file
├── ANALYSIS_GUIDE.md # Detailed analysis documentation
├── LICENSE # MIT License
├── requirements.txt # Python dependencies
│
├── data/
│ └── SuperMarket_Analysis.csv # Sample dataset (1,000 transactions)
│
├── notebooks/
│ ├── analysis.ipynb # Main analysis notebook
│ └── implementation_guide.ipynb # Step-by-step tutorial
│
├── scripts/
│ ├── supermarket_advanced_analysis.py # Core analysis engine (13 sections)
│ ├── supermarket_visualizations.py # Visualization generation
│ └── supermarket_implementation_guide.py # Step-by-step code
│
├── outputs/
│ ├── 01_executive_dashboard.png # KPIs & key trends
│ ├── 02_temporal_patterns.png # Seasonality & trends
│ ├── 03_customer_segmentation.png # Customer behavior
│ ├── 04_store_benchmarking.png # Store comparisons
│ ├── 05_category_analysis.png # Product performance
│ ├── 06_summary_report_card.png # Executive summary
│ └── supermarket_analysis_summary.csv # Export metrics
│
└── docs/
├── INSTALLATION.md # Setup instructions
├── USAGE.md # How to run the analysis
└── INTERPRETATION.md # Understanding the results
- Python 3.8+
- Jupyter Lab or Jupyter Notebook
- pip package manager
- Clone the repository:
git clone https://github.com/yourusername/supermarket-analytics.git
cd supermarket-analytics- Create a virtual environment (optional but recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtjupyter lab
# Then in a cell:
exec(open('scripts/supermarket_advanced_analysis.py').read())
exec(open('scripts/supermarket_visualizations.py').read())jupyter lab
# Open: notebooks/implementation_guide.ipynb
# Run cells one by one to understand each steppython scripts/supermarket_advanced_analysis.py
python scripts/supermarket_visualizations.pyHigh-level KPIs including total revenue, transactions, customer satisfaction, and store breakdown.
Key Metrics:
- Total Revenue: $X.XXM
- Transaction Count: X,XXX
- Average Transaction Value: $XXX.XX
- Customer Satisfaction: X.XX/10.0
Discovers patterns in sales over time: daily, weekly, monthly, and quarterly trends.
Insights:
- Peak days of week (e.g., Saturday +15% above average)
- Seasonal patterns (e.g., Q4 +35% above baseline)
- Monthly growth trends
- Best performing time periods
Deep dive into product performance with Pareto analysis.
Discoveries:
- Top 10 revenue-generating categories
- Category growth rates
- Pricing and transaction patterns
- Revenue concentration (80/20 rule)
RFM (Recency, Frequency, Monetary) analysis for customer behavior.
Segments:
- High-value loyal customers
- At-risk customers
- New customers
- Dormant customers
- Member vs. Non-member analysis
- Gender and demographic breakdown
Comparative analysis across stores.
Metrics:
- Revenue ranking
- Transaction volume
- Average transaction value
- Customer satisfaction by store
- Consistency analysis
Customer satisfaction insights and quality control.
Findings:
- Average rating by category
- Categories below satisfaction threshold
- Rating distribution
- Quality improvement opportunities
Identifies unusual transactions and operational issues.
Detections:
- High-value transaction analysis
- Low-revenue days
- Outlier transactions
- Potential data quality issues
Identifies product pairs frequently purchased together.
Uses:
- Bundle promotion recommendations
- Product placement optimization
- Inventory correlation planning
Margin and profit metrics by category.
Metrics:
- Gross profit by category
- Profit margins
- Profit per transaction
- Most profitable products
Trend projection and seasonality strength.
Outputs:
- Growth trajectory
- Seasonality factor
- Trend direction
- Next period projections
Actionable insights ranked by impact.
Recommendations Include:
- Inventory optimization
- Promotion strategy
- Store improvement initiatives
- Customer loyalty programs
- Payment system optimization
EXECUTIVE SUMMARY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total Revenue: $322,965.25
Total Transactions: 1,000
Avg Transaction Value: $322.97
Customer Satisfaction: 6.97/10.0
REVENUE BY STORE:
Total Revenue Avg Transaction Avg Rating
Yangon $111,923.78 $316.54 6.75
Naypyitaw $106,482.24 $329.18 7.02
Giza $104,559.23 $321.54 7.22
TOP 3 CATEGORIES:
1. Food and beverages: $56,471 (17.5% of revenue)
2. Fashion accessories: $54,601 (16.9% of revenue)
3. Sports and travel: $53,861 (16.7% of revenue)
SEASONALITY ANALYSIS:
Strong Q4 peak (+35% above average)
Recommend 2-month advance inventory buildup
GROWTH ANALYSIS:
Member penetration: 31.5%
Member spend premium: 1.8x non-members
Loyalty opportunity: +$X revenue if 50% penetration
This analysis can be adapted for:
- Retail Analytics - Optimize store operations and inventory
- E-commerce - Improve product recommendations and bundling
- Marketing - Target customer segments effectively
- Finance - Profitability analysis and margin optimization
- Operations - Identify underperforming locations
- Portfolio Projects - Demonstrate data science capabilities
- Job Interviews - Show real-world analytics experience
- Academic Projects - Apply statistics and data science concepts
KPIs, monthly trends, quarterly breakdown, top categories, store performance.
Day-of-week patterns, monthly trends, category trends over time, quarterly performance.
Customer type breakdown, gender analysis, payment method distribution, satisfaction ratings.
Revenue comparison, transaction volume, average transaction value, customer ratings.
Store-category heatmap, performance matrix, top categories, satisfaction ranking.
Text-based executive summary with key metrics and actionable recommendations.
- Replace the dataset:
# In your notebook:
sales = pd.read_csv('your_data.csv')- Update column names:
sales.rename(columns={
'your_date_column': 'Date',
'your_revenue_column': 'Revenue',
'your_store_column': 'Store'
}, inplace=True)- Run the analysis:
exec(open('scripts/supermarket_advanced_analysis.py').read())The modular structure makes it easy to add sections:
# Add a new section
print("\n" + "="*80)
print("MY CUSTOM ANALYSIS")
print("="*80)
my_metric = sales.groupby('your_column').agg({...})
print(my_metric)- INSTALLATION.md - Detailed setup guide
- USAGE.md - How to run each script
- INTERPRETATION.md - Understanding the results
- ANALYSIS_GUIDE.md - Deep dive into each analysis section
| Tool | Purpose |
|---|---|
| Python 3.8+ | Core programming language |
| Pandas | Data manipulation and analysis |
| NumPy | Numerical computations |
| Matplotlib | Visualization creation |
| Seaborn | Statistical data visualization |
| Jupyter Lab | Interactive notebooks |
All dependencies are listed in requirements.txt:
pandas>=1.3.0
numpy>=1.21.0
matplotlib>=3.4.0
seaborn>=0.11.0
jupyter>=1.0.0
jupyterlab>=3.0.0
Install with:
pip install -r requirements.txtThe included SuperMarket_Analysis.csv contains:
- 1,000 transactions across 3 stores
- 17 features including date, product line, revenue, customer type, rating
- 6 product categories (Food, Electronics, Fashion, etc.)
- Real-world data complexity (missing values, outliers, seasonality)
Perfect for learning and demonstration purposes.
After running the analysis, you'll get:
- Console Output - 100+ detailed metrics and insights
- 6 PNG Visualizations - Publication-ready (300 DPI)
- CSV Export - Summary metrics table
- Actionable Recommendations - Ranked by business impact
After working through this project, you'll understand:
- Data loading and cleaning
- Multi-level aggregations and grouping
- Time series analysis
- Customer segmentation techniques
- Anomaly detection methods
- Professional visualization creation
- Statistical analysis and metrics
- Identifying revenue drivers
- Analyzing customer behavior
- Spotting operational issues
- Calculating business impact
- Creating executive-ready presentations
- Strategic thinking and planning
- Data storytelling
- Communicating with stakeholders
- Problem decomposition
- Documentation best practices
"FileNotFoundError: CSV not found"
# Make sure CSV is in the same directory
import os
print(os.listdir()) # Should show your CSV"ModuleNotFoundError: pandas"
pip install pandas matplotlib seabornVisualizations don't display in Jupyter
%matplotlib inlineSee docs/INSTALLATION.md for more troubleshooting.
This project is licensed under the MIT License - see LICENSE file for details.
Free for personal and commercial use with attribution.
Contributions are welcome! Here's how you can help:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-analysis) - Commit your changes (
git commit -m 'Add amazing analysis') - Push to the branch (
git push origin feature/amazing-analysis) - Open a Pull Request
- Additional analysis sections
- Different datasets
- Interactive dashboards
- Machine learning models
- Real-time analytics
- Documentation improvements
-
Clone this repository
git clone https://github.com/yourusername/supermarket-analytics.git
-
Install dependencies
pip install -r requirements.txt
-
Run the analysis
jupyter lab
-
Explore the visualizations
- Check the
outputs/folder for generated charts - Review the console output for detailed metrics
- Check the
-
Customize for your data
- Replace the CSV with your dataset
- Modify column names and parameters
- Add your own analysis sections
- Core analysis implementation
- Visualization generation
- Documentation
- Interactive Jupyter widgets
- Streamlit dashboard
- REST API for real-time analysis
- Machine learning models
- Time series forecasting
- Docker containerization
Before presenting this analysis:
- Downloaded and extracted the repository
- Installed all dependencies
- Successfully ran the analysis scripts
- Generated all visualizations
- Reviewed console output metrics
- Understood each analysis section
- Created presentation slides
- Practiced explanation of insights
- Prepared answers to likely questions
- Lines of Code: 1,500+
- Analysis Sections: 11
- Visualizations: 6
- Metrics Calculated: 100+
- Execution Time: 5-10 minutes
- Documentation: Comprehensive
This project demonstrates:
- Real-world complexity - Working with actual retail data
- End-to-end workflow - From raw data to actionable insights
- Business acumen - Understanding ROI and strategic impact
- Technical excellence - Clean, professional code
- Communication skills - Executive-ready visualizations
- Problem-solving - Answering complex business questions
Made by Eva Safi