🍔 FoodExpress · Data Analytics

Give your stakeholders the food delivery insights they need.
Clean, analyze, test, and visualize one month of FoodExpress platform data — end to end.

Overview • Dataset • Project Structure • Setup • Key Analyses • Findings • Author • Contributing

📌 Overview

This project performs a comprehensive exploratory data analysis (EDA) and statistical inference study on one month of delivery operations for the FoodExpress platform — a rapidly growing food delivery service connecting customers with local restaurants across four city zones.

The analysis pipeline covers:

✅ Data cleaning — null imputation, sentinel correction, datatype standardization, outlier removal, and data reshaping
✅ Feature engineering — net revenue calculation, discount aggregation, and derived temporal features (day, week, segment)
✅ Exploratory analysis (EDA) — order distribution, customer segmentation, cuisine trends, and delivery performance patterns
✅ Statistical testing — hypothesis testing including t-tests and chi-square tests to validate operational and customer insights
✅ Business insights — identification of key drivers of customer satisfaction, revenue performance, and operational efficiency
✅ Interactive dashboard (Streamlit) — real-time filtering, KPI monitoring, and visualization of operational and revenue metrics

📊 Dashboard Preview

🖥️ Main Dashboard

📈 KPI & Metrics View

📉 Analytical Charts

Metric	Value
Period	September 2024
Raw orders	1,000
Clean records	990
Total revenue	$35,575.93
Net platform revenue	$25,013.29
City zones	North · South · East · West

📂 Dataset

The dataset foodexpress_data_uncleaned.csv contains 24 columns across six categories:

Category	Columns
Order details	`OrderID`, `OrderDate`, `OrderTime`, `DayOfWeek`, `TimePeriod`
Customer info	`CustomerSegment` (New / Regular / VIP), `City`
Restaurant	`RestaurantID`, `CuisineType`, `RestaurantRating`
Delivery metrics	`DeliveryTime`, `DeliveryDistance`, `DeliveryFee`, `WeatherCondition`
Financials	`OrderValue`, `DiscountAmount`, `PromoDiscount`, `TipAmount`, `CashPayment`, `CardPayment`, `WalletPayment`, `UPIPayment`
Performance	`CustomerRating`, `DeliveryPartnerRating`

⚠️ Raw data quality issues: 294 missing values across 4 columns, sentinel error 99999 in DeliveryTime, and "INVALID" / "N/A" strings in CustomerRating. All resolved in the cleaning pipeline.

🗂 Project Structure

FoodExpress-Data-Analytics/
│
├── app.py # Streamlit dashboard
├── FoodExpress Data Analytics Project.ipynb # Analysis notebook
├── foodexpress_data_uncleaned.csv # Raw dataset
├── requirements.txt # Dependencies
├── assets/ # Images & visuals
├── README.md # Documentation

---

## ⚙️ Setup

**Requirements:** Python 3.10+, Jupyter Notebook

```bash
# 1. Clone the repository
git clone https://github.com/doyancha/FoodExpress-Data-Analytics.git
cd FoodExpress-Data-Analytics

# 2. Install dependencies
pip install pandas numpy scipy matplotlib seaborn statsmodels jupyterlab

# 3. Launch the notebook
jupyter notebook "FoodExpress Data Analytics Project.ipynb"

The dataset foodexpress_data_uncleaned.csv must sit in the same directory as the notebook. All cleaning steps are self-contained within the notebook cells.

🔍 Key Analyses

1 · Data Cleaning — Six-Step Pipeline

The raw CSV required six discrete cleaning operations before analysis:

Step	Action	Outcome
1	Impute `CustomerRating` (79), `RestaurantRating` (75), `DeliveryTime` (70), `TipAmount` (70) with column median	294 nulls resolved
2	Replace sentinel value `99999` in `DeliveryTime` with median of valid values	Outlier corrected
3	Replace `"INVALID"` / `"N/A"` strings in `CustomerRating` → NaN → median fill	`dtype` fixed to `float64`
4	Convert `OrderDate` → `datetime64`; financial & rating columns → `float64`	Types corrected
5	Drop rows where `DeliveryTime > P99` or `OrderValue < $5`	10 rows removed → 990 rows
6	Reshape 4 wide payment columns → `PaymentMethod` + `PaymentAmount` using `melt()`	1 payment per row

# Step 6 — Reshape payments from wide → long format
payment_cols = ['CashPayment', 'CardPayment', 'WalletPayment', 'UPIPayment']

df_payment = df_clean.melt(
    id_vars=[c for c in df_clean.columns if c not in payment_cols],
    value_vars=payment_cols,
    var_name='PaymentType',
    value_name='PaymentAmount'
)
df_payment = df_payment[df_payment['PaymentAmount'] > 0].copy()
# Result → 990 orders : 990 payments (1:1 mapping)

2 · Python Fundamentals

Revenue Calculation — For Loop

total_revenue = 0.0
for value in df['OrderValue']:
    total_revenue += value

# → Total Revenue: $35,575.93

Premium Delivery Classifier

A delivery is "Premium" when DeliveryFee > $5.00 and DeliveryTime < 25 minutes.

def is_premium_delivery(order_id):
    row = df[df['OrderID'] == order_id]
    if row.empty:
        return False
    return (row['DeliveryFee'].iloc[0] > 5.00) and (row['DeliveryTime'].iloc[0] < 25)

# Vectorized scan → 49 premium deliveries found

Average Order Value by Cuisine

def avg_order_value_by_cuisine(cuisine_type):
    subset = df[df['CuisineType'] == cuisine_type]['OrderValue']
    if subset.empty:
        return None
    return {'cuisine': cuisine_type, 'avg': subset.mean(), 'count': len(subset)}

3 · Feature Engineering — Net Revenue

NetRevenue = OrderValue − DiscountAmount − PromoDiscount − (OrderValue × 0.20)

A 20% platform commission is applied to every order. The NetRevenue column reflects true platform earnings after all deductions.

COMMISSION_RATE = 0.20
df_payment['RestaurantCommission'] = df_payment['OrderValue'] * COMMISSION_RATE
df_payment['NetRevenue'] = (
    df_payment['OrderValue']
    - df_payment['DiscountAmount'].fillna(0)
    - df_payment['PromoDiscount'].fillna(0)
    - df_payment['RestaurantCommission']
)

Net revenue by city zone:

Rank	Zone	Net Revenue
🥇	North Zone	$6,631.72
🥈	West Zone	$6,374.00
🥉	South Zone	$6,060.36
4th	East Zone	$5,947.21

4 · Visualizations

Six charts were produced using matplotlib and seaborn:

Chart	Type	Key Insight
Orders by cuisine	Horizontal bar	Indian leads with 152 orders
Revenue by cuisine × city	Heatmap	Surface top cuisine-city revenue combinations
Payment method by segment	100% stacked bar	Uniform spread — no segment preference
Daily order trend	Line plot	Temporal patterns across September
ry time by city	Box plot	North Zone has heavy right-skew
Campaign AOV impact	Paired scatter	Before vs After AOV per restaurant

📊 Dashboard Visualizations

# Heatmap — cuisine × city net revenue
pivot = df_payment.pivot_table(
    values='NetRevenue', index='CuisineType', columns='City', aggfunc='sum'
)
sns.heatmap(pivot, annot=True, fmt='.0f', cmap='YlOrRd', linewidths=0.5)

# 100% Stacked bar — payment method by customer segment
pct = pd.crosstab(
    df_clean['CustomerSegment'], df_clean['PaymentMethod'], normalize='index'
) * 100
pct.plot(kind='bar', stacked=True,
         color=['#e74c3c', '#3498db', '#2ecc71', '#f1c40f'], edgecolor='black')

5 · Statistical Tests — α = 0.05

Test 1 — Independent Samples t-test: VIP vs Regular Ratings

H₀ : μ(VIP ratings) = μ(Regular ratings)
H₁ : The means differ

from scipy import stats

vip_r     = df[df['CustomerSegment'] == 'VIP Customer']['CustomerRating']
regular_r = df[df['CustomerSegment'] == 'Regular Customer']['CustomerRating']
t_stat, p_value = stats.ttest_ind(vip_r, regular_r)

customers_ratings_vip_vs_regular_customers

Result: p ≥ 0.05 → Fail to reject H₀

No significant difference in satisfaction between VIP and Regular customers. The platform delivers a consistent experience across tiers.

Test 2 — One-Sample t-test: North Zone Delivery Target

H₀ : μ(North Zone delivery time) = 30 minutes
H₁ : μ ≠ 30 minutes

north = df[df['City'] == 'North Zone']['DeliveryTime'].dropna()
t_stat, p_value = stats.ttest_1samp(north, popmean=30.0)
# Observed mean: 125 min  |  Only 36.7% of orders delivered within 30 min

Result: p < 0.05 → Reject H₀

North Zone significantly misses its 30-minute target. Operational intervention is required.

Test 3 — Chi-Square Test of Association: Payment Method vs Segment

H₀ : Payment method choice is independent of customer segment
H₁ : There is a significant association

Result: p ≥ 0.05 → Fail to reject H₀

Payment habits are uniform across all customer segments. A single payment promotion strategy is sufficient.

Test 4 — Chi-Square Goodness of Fit: Cuisine Distribution

H₀ : Each of 5 cuisines accounts for exactly 20% of orders
H₁ : Distribution is not uniform

from scipy.stats import chisquare

observed = df_clean[df_clean['CuisineType'].isin(
    ['Italian', 'Chinese', 'Indian', 'Fast Food', 'Mexican']
)]['CuisineType'].value_counts()

chi2, p_value = chisquare(observed)
# χ² = 9.11  |  p = 0.0585

Result: p = 0.0585 → Fail to reject H₀ (marginal)

Indian cuisine leads at 152 orders, but no statistically significant imbalance detected at α = 0.05.

6 · Business Intelligence

Correlation Analysis — Drivers of Customer Rating

factors = ['DeliveryDistance', 'OrderValue', 'RestaurantRating', 'DeliveryPartnerRating']
correlations = {f: df['CustomerRating'].corr(df[f]) for f in factors}

fig, axes = plt.subplots(2, 2, figsize=(14, 10))
for idx, factor in enumerate(factors):
    sns.regplot(data=df, x=factor, y='CustomerRating',
                ax=axes.ravel()[idx], line_kws={'color': 'red'})

Factor	Direction	Strength
`DeliveryPartnerRating`	Positive	Strong — biggest satisfaction driver
`RestaurantRating`	Positive	Weak — food quality is secondary
`DeliveryDistance`	Negative	Weak — longer distance → lower satisfaction
`OrderValue`	~Zero	None — spend level doesn't predict satisfaction

Promotional Campaign — Paired t-test

Average order value (AOV) per restaurant was compared before (Week 1) and after (Week 3) a promotional campaign.

before = df_campaign[df_campaign['Period'] == 'Before (Week 1)'] \
             .groupby('RestaurantID')['OrderValue'].mean()
after  = df_campaign[df_campaign['Period'] == 'After (Week 3)'] \
             .groupby('RestaurantID')['OrderValue'].mean()

paired  = pd.concat([before, after], axis=1, join='inner')
t_stat, p_value = stats.ttest_rel(paired.iloc[:, 0], paired.iloc[:, 1])
# AOV change: −13.3%  |  Revenue lift: −$7,228.79  |  p ≥ 0.05

Result: p ≥ 0.05 → No statistically significant change in AOV from the campaign.

💡 Key Findings

🏙️ North Zone generates the highest net revenue ($6,631.72) but misses the 30-minute delivery target — only 36.7% of orders arrive on time.
🚴 Delivery partner rating is the single strongest predictor of customer satisfaction, outweighing food quality and spend level.
⭐ VIP and Regular customers give statistically equivalent ratings — operations are fair and consistent across segments.
💳 Payment preferences are uniform across all customer segments — no targeted payment strategy is needed.
🍛 Indian cuisine leads in volume (152 orders), though no cuisine is statistically over- or under-represented.
📉 The promotional campaign had no statistically significant effect on average order value.

📣 Recommendations

Fix North Zone delivery — invest in route optimization or zone-specific partner allocation to push ≤30-min delivery rate above 70%.
Prioritize partner quality — since partner rating drives satisfaction above all else, implement tiered incentives tied to delivery rating performance.
Redesign the promo mechanics — flat discounts did not move AOV; test free delivery thresholds, bundling, or cuisine-specific limited-time offers.
Shift to volume campaigns — since AOV is unresponsive to promos, redirect marketing toward order frequency and new customer acquisition.
Formalize A/B measurement — every future campaign should have a pre/post paired t-test built in before budget is committed.

🛠 Tech Stack

Library	Version	Purpose
`pandas`	2.x	Data loading, cleaning, groupby, melt, feature engineering
`numpy`	1.x	Numerical operations, median imputation
`scipy`	1.x	t-tests, chi-square tests, winsorization
`matplotlib`	3.x	Base plotting engine
`seaborn`	0.13+	Heatmaps, box plots, regression plots, facet grids
`statsmodels`	0.14+	Tukey HSD post-hoc tests

👤 Author

Doyancha

Data analyst focused on operations analytics, statistical inference, and Python-based data pipelines.

🤝 Contributing

Contributions, issues, and feature requests are welcome!

Fork the repository
Create a feature branch — git checkout -b feature/your-idea
Commit your changes — git commit -m "Add: your idea"
Push to the branch — git push origin feature/your-idea
Open a Pull Request

Please open an issue first to discuss any significant change.

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

Made with 🍔 and data

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
assets		assets
FoodExpress Data Analytics Project EDA.pdf		FoodExpress Data Analytics Project EDA.pdf
FoodExpress Data Analytics Project.docx		FoodExpress Data Analytics Project.docx
FoodExpress Data Analytics Project.ipynb		FoodExpress Data Analytics Project.ipynb
README.md		README.md
app.py		app.py
foodexpress_data_uncleaned.csv		foodexpress_data_uncleaned.csv
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🍔 FoodExpress · Data Analytics

📌 Overview

📊 Dashboard Preview

🖥️ Main Dashboard

📈 KPI & Metrics View

📉 Analytical Charts

📂 Dataset

🗂 Project Structure

🔍 Key Analyses

1 · Data Cleaning — Six-Step Pipeline

2 · Python Fundamentals

Revenue Calculation — For Loop

Premium Delivery Classifier

Average Order Value by Cuisine

3 · Feature Engineering — Net Revenue

4 · Visualizations

📊 Dashboard Visualizations

5 · Statistical Tests — α = 0.05

Test 1 — Independent Samples t-test: VIP vs Regular Ratings

Test 2 — One-Sample t-test: North Zone Delivery Target

Test 3 — Chi-Square Test of Association: Payment Method vs Segment

Test 4 — Chi-Square Goodness of Fit: Cuisine Distribution

6 · Business Intelligence

Correlation Analysis — Drivers of Customer Rating

Promotional Campaign — Paired t-test

💡 Key Findings

📣 Recommendations

🛠 Tech Stack

👤 Author

🤝 Contributing

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages