| Document | Description |
|---|---|
| README.md | Project overview and setup instructions |
| NOTES.md | Data analytics concepts and methodology used in this project |
| POWERBI_DASHBOARD.md | Power BI dashboard overview, KPIs, and business insights |
This project presents a complete end-to-end analysis of the Superstore retail dataset using Python. The objective is to transform raw sales data into actionable business insights through data cleaning, exploratory data analysis (EDA), customer analytics, profitability analysis, forecasting, and strategic recommendations.
The project follows a real-world analytics workflow used by data analysts to identify growth opportunities, optimize profitability, and support data-driven decision-making.
- Analyze overall sales and profit performance
- Identify high-performing products and categories
- Discover loss-making products and regions
- Understand customer purchasing behavior
- Measure the impact of discounts on profitability
- Perform customer segmentation using RFM analysis
- Forecast future sales trends
- Generate business recommendations based on findings
├── main.ipynb
├── superstore analysis.csv
├── cleaned_superstore.csv
├── screenshots/
├── README.md
| File | Description |
|---|---|
| main.ipynb | Complete analysis notebook |
| superstore analysis.csv | Final processed dataset |
| cleaned_superstore.csv | Cleaned dataset exported after preprocessing |
| screenshots/ | Visualizations and dashboard screenshots |
| README.md | Project documentation |
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Scikit-Learn
- Statsmodels
- Jupyter Notebook
pip install pandas numpy matplotlib seaborn scikit-learn statsmodels openpyxl- Clone the repository
git clone <repository-url>- Navigate to the project folder
cd superstore-sales-analytics- Launch Jupyter Notebook
jupyter notebook- Open:
main.ipynb
- Run all cells sequentially.
- Missing value detection
- Duplicate record detection
- Data type verification
- Postal code correction
- Dataset validation
- Total Sales
- Total Orders
- Average Order Value
- Monthly Sales Trends
- Category Performance
- Sub-Category Performance
- Total Revenue
- Total Orders
- Total Quantity Sold
- Average Order Value
- Sales Growth
- Profit by Category
- Profit by Sub-Category
- Profit by Product
- Regional Profitability
- Loss-Making Product Identification
- Consumer
- Corporate
- Home Office
- Top Customers
- Average Revenue per Customer
- Customer Contribution Analysis
Identifies the percentage of customers responsible for approximately 80% of total sales.
Customers are segmented based on:
- Recency
- Frequency
- Monetary Value
RFM helps identify:
- High-value customers
- Loyal customers
- At-risk customers
- Low-value customers
- Best-Selling Products
- Most Profitable Products
- Least Profitable Products
- Quantity Sold by Category
Analysis performed at:
- Region Level
- State Level
Metrics:
- Sales
- Profit
- Profit Margin
Relationships between:
- Sales
- Profit
- Quantity
- Discount
Visualizations:
- Correlation Heatmap
- Scatter Plots
Key Finding:
A strong negative correlation exists between discounts and profitability.
Monthly and yearly trends for:
- Sales
- Profit
- Orders
- Quantity
Includes:
- Monthly Trend Analysis
- Year-over-Year Growth Analysis
Implemented forecasting using:
- Exponential Smoothing (Holt-Winters Method)
Forecast Horizon:
- Next 6 Months Sales Prediction
Library Used:
from statsmodels.tsa.holtwinters import ExponentialSmoothing- Technology generated the highest revenue.
- Consumer segment contributed the largest share of sales.
- West region achieved the highest sales performance.
- Technology was the most profitable category.
- Several products generated negative profit.
- Higher discounts significantly reduced profitability.
- A small percentage of customers generated the majority of revenue.
- High-value customers contributed disproportionately to sales.
- Sales exhibited clear seasonal patterns.
- Certain states consistently outperformed others.
- Reduce excessive discounting.
- Monitor promotion profitability.
- Use bundled offers instead of direct discounts.
- Implement loyalty programs.
- Offer personalized promotions.
- Focus on high-value customers.
- Increase investment in profitable markets.
- Replicate successful regional strategies.
- Promote profitable products.
- Reevaluate loss-making products.
- Improve inventory allocation.
- Use forecasting for inventory planning.
- Prepare for seasonal demand fluctuations.
- Sales by Category
- Sales by Region
- Monthly Sales Trends
- Profit by Category
- Customer Segment Analysis
- Correlation Heatmap
- Top Customers Analysis
- Product Profitability Analysis
- Sales Forecasting Charts
The project exports:
cleaned_superstore.csv
and
superstore analysis.csv
- Data Cleaning
- Exploratory Data Analysis (EDA)
- Business Analytics
- Data Visualization
- Customer Segmentation
- Statistical Analysis
- Time Series Forecasting
- Business Intelligence
- Python Programming
- Data Storytelling
Jaiv Patel
Aspiring Data Analyst | Python | SQL | Power BI | Machine Learning