Econometrics · Gradient Boosting · Tabular Deep Learning · External-Data Augmentation · SHAP · Streamlit · Docker · CI/CD
Live tour: executive overview → data integrity → econometrics → ML benchmark → scenario simulator.
| Executive Overview | Data Integrity |
|---|---|
![]() |
![]() |
| SQL Insights (DuckDB) | Econometrics |
![]() |
![]() |
| ML Benchmark | Explainability (SHAP) |
![]() |
![]() |
| Scenario Simulator | Decision under uncertainty |
![]() |
![]() |
DuckDB SQL · interactive Plotly · SHAP explainability · Bayesian-flavoured what-if simulator with credible intervals
An end-to-end decision-support platform built on 15 years (2010–2024) of BMW sales records (50,000 transactions, 11 features). It pairs rigorous econometrics with modern machine learning, enriches the data with real external APIs (macro-economics, fuel prices, CO₂ regulation, FX), and ships a premium Streamlit dashboard behind a fully containerised, CI/CD-tested codebase.
Data source: the base dataset is the public BMW Sales Dataset on Kaggle by eshummalik. All external macro/fuel/CO₂/FX context is added by this project (see ADR-0003).
1 — I can build a model that works. The pipeline reaches a cross-validated R² ≈ 0.85 on signal-bearing data, with SHAP recovering the true drivers — a validated model, not a lucky split. (predictive capability)
2 — I won't fake it when the data is empty. This particular dataset is structurally pristine but signal-free (every feature is statistically independent of the targets), and
Sales_Classificationis a leaked threshold onSales_Volume. On it the same pipeline honestly scores R² ≈ 0 / AUC ≈ 0.5 — proven with a permutation test and a positive control, not hidden. Business value is then delivered through a clearly-labelled Scenario Simulator.Predictive competence and intellectual honesty — that is the senior deliverable. Evidence: Predictive Capability · Data Integrity · Signal Audit · ADR-0002.
| Analysis | Result | What it means |
|---|---|---|
| Max |correlation| among numeric features | 0.009 | Features are mutually independent noise |
| Price elasticity of demand (log-log, HC3) | −0.001 (p = 0.92) | No measurable price sensitivity in-sample |
| Hedonic price model R² | 0.0004 | Price is unexplained by attributes here |
| Regression R² (best of XGB/LGBM/CatBoost) | ≈ 0.00 | Boosting cannot beat the mean — no signal |
| Classification ROC-AUC (leakage-free) | ≈ 0.51 | No discriminative signal once leakage removed |
| Classification ROC-AUC (leak left in) | 1.00 | The signature of target leakage |
| Permutation test (label-shuffle) | p ≈ 0.90 | Real score indistinguishable from chance |
| Predictive capability (same pipeline, signal-bearing target) | CV R² ≈ 0.85 ± 0.003 | The pipeline does predict — when there is signal |
| Tabular MLP vs gradient boosting | both no-skill | Deep learning not justified (ADR-0004) |
Reports: econometrics · model benchmark · DL vs ML.
bmw-sales/
├── src/bmw_sales/
│ ├── config.py # typed pydantic-settings + canonical DatasetSchema
│ ├── data/ # loader (schema validation) · validation (integrity report)
│ ├── audit/ # No-Signal Auditor: permutation · positive control · KS · χ²
│ ├── apis/ # hybrid real+mock clients · enrichment join
│ │ ├── base.py # cache + retry + circuit breaker + provenance
│ │ ├── worldbank.py · fx_rates.py · fuel_prices.py · co2_regulations.py
│ ├── features/ # domain feature engineering
│ ├── econometrics/ # OLS hedonic · demand · elasticity · VIF · leakage proof
│ ├── models/ # preprocessing · XGB/LGBM/CatBoost · tabular MLP · MLflow
│ ├── simulation/ # Scenario Simulator + Monte-Carlo uncertainty
│ ├── explainability/ # SHAP attributions
│ └── sql/ # DuckDB analytics over sql/queries/*.sql
├── app/ # Streamlit premium UI (theme · data_access · 7 tabs)
├── sql/queries/ # versioned analytical SQL
├── tests/ # pytest suite (unit + integration)
├── docs/ # MkDocs Material site + 9 ADRs
├── reports/ # generated analyses (committed)
├── Dockerfile · docker-compose.yml · .github/workflows/{main,docs}.yml
└── Makefile · mkdocs.yml · pyproject.toml · requirements*.txt
Design rationale: ADR-0001.
The full flow from raw data to a deployed decision-support app. The honest-analytics spine (gold) is what makes this a senior deliverable: the data is audited and proven signal-free before any model is trusted.
flowchart TB
RAW["Raw dataset<br/>BMW_sales_data 2010–2024<br/>50,000 rows × 11 cols"]
subgraph L1["① Data foundation · bmw_sales.data"]
LOAD["loader.py<br/>schema validation · dtype coercion"]
VAL["validation.py<br/>correlation · ANOVA · mutual-info · leakage"]
end
subgraph L2["② Signal audit · bmw_sales.audit"]
PERM["permutation / label-shuffle test<br/>p ≈ 0.90 → no signal"]
CTRL["positive control<br/>synthetic R² ≈ 0.86 vs real ≈ 0"]
KS["KS-uniformity · χ² independence"]
end
subgraph L3["③ External augmentation · bmw_sales.apis"]
WB["WorldBank · FX<br/>real endpoints"]
FC["Fuel · CO₂<br/>mock-first"]
BASE["base.py<br/>cache → retry → circuit-breaker → mock"]
ENR["enrichment.py<br/>region × year × fuel panel join"]
end
subgraph L4["④ Features · bmw_sales.features"]
FE["engineering.py<br/>age · usage · electrified · log transforms"]
end
subgraph L5["⑤ Modelling"]
ECON["econometrics<br/>hedonic OLS · elasticity (HC3) · leakage proof"]
ML["ml_models<br/>XGBoost · LightGBM · CatBoost + RandomizedSearchCV"]
DL["dl_models<br/>PyTorch tabular MLP (early stopping)"]
end
subgraph L6["⑥ Decision intelligence"]
SIM["simulation<br/>elasticity scenario + Monte-Carlo CIs"]
SHAP["explainability<br/>SHAP attributions"]
SQL["sql · DuckDB<br/>region · price · YoY · electrification"]
end
REPORTS[("reports/<br/>integrity · signal_audit · econometric<br/>model_benchmark · dl_vs_ml · sql_insights")]
MLF[("MLflow<br/>./mlruns")]
ART[("models/*.joblib")]
APP["Streamlit app · 7 tabs<br/>Overview · Integrity · SQL · Econometrics<br/>ML · SHAP · Scenario Simulator"]
RAW --> LOAD --> VAL
RAW --> L2
RAW --> SQL
LOAD --> ENR
WB & FC --> BASE --> ENR
VAL --> FE
ENR --> FE
FE --> ECON & ML & DL
ML --> ART
ML --> SHAP
ENR -. macro baselines .-> SIM
L2 --> REPORTS
ECON & ML & DL & SQL --> REPORTS
ML --> MLF
REPORTS --> APP
SIM & SHAP & SQL --> APP
classDef honest fill:#241f08,stroke:#D4AF37,stroke-width:2px,color:#fff;
classDef store fill:#15151a,stroke:#8FA9C7,color:#cfe;
class RAW,L1,L2,FE honest;
class REPORTS,MLF,ART store;
Every external client degrades gracefully, so CI/Docker run with no network or keys yet the real path is proven live.
flowchart LR
REQ["client.fetch(region, years)"] --> C{"disk cache hit?"}
C -- yes --> HIT["return cached<br/>provenance = cache"]
C -- no --> OFF{"offline mode<br/>or breaker open?"}
OFF -- yes --> MOCK["deterministic mock<br/>provenance = mock"]
OFF -- no --> LIVE["HTTP GET + retry/backoff"]
LIVE -- success --> SAVE["cache + return<br/>provenance = live"]
LIVE -- failure --> TRIP["trip circuit-breaker"] --> MOCK
flowchart LR
DEV["commit on feature/* branch"] --> PC["pre-commit<br/>black · isort · flake8 · mypy"]
PC --> PUSH["push → main"]
PUSH --> CI{"GitHub Actions"}
CI --> Q["quality (3.11 / 3.12)<br/>black · isort · flake8 · mypy<br/>pytest + 62% coverage gate"]
CI --> SEC["pip-audit"]
Q --> DK["Docker build + Trivy scan"]
PUSH --> DOCS["MkDocs build"]
DK -. image .-> HF["HF Spaces (Docker)<br/>live app"]
DOCS --> GP["GitHub Pages<br/>docs site"]
Four sources mapped to the six regions via official World Bank aggregate codes (EAS, NAC, MEA, LCN, EMU, SSF) and representative currencies/countries. Every client caches responses, retries with backoff, and trips a circuit breaker to a deterministic mock on failure — so the project runs fully offline yet three of the four sources are validated live against real APIs.
| Source | Status | Real endpoint | Signal it adds |
|---|---|---|---|
| World Bank macro | 🟢 real | inflation FP.CPI.TOTL.ZG, GDP/cap NY.GDP.PCAP.CD |
regional purchasing power |
| FX rates | 🟢 real | exchangerate.host | local-currency price normalisation |
| CO₂ emissions | 🟢 real | World Bank CO₂/capita EN.GHG.CO2.PC.CE.AR5 |
the electrification transition |
| Fuel prices | 🟡 mock-first | WB pump-price EP.PMP.SGAS.CD archived by WB (2024) |
Petrol/Diesel vs electrified economics |
Honesty applies to the data layer too: fuel stays mock-first because the World Bank archived its pump-price series — the real hook is kept and the provenance is reported as
mockrather than faking it.
Details: ADR-0003.
Because the data cannot forecast, decision value comes from an explicit what-if simulation — a constant-elasticity demand model with literature-grounded priors (own-price ε ≈ −0.6, income ε ≈ +1.5, fuel cross-elasticity, CO₂-regulation shift) and baselines seeded from the real macro APIs. Every driver's contribution is decomposed in a waterfall chart, and all assumptions are adjustable in the UI. It is never presented as a fit to the historical data.
# Install (dev includes linting, tests, torch for the DL benchmark)
make install-dev # or: pip install -r requirements-dev.txt
make eda # regenerate the Data Integrity Report
make pipeline # train & benchmark all models (writes reports/)
make test # full suite, offline & deterministic
make app # launch the dashboard → http://localhost:8501docker compose up --build # → http://localhost:8501Or pull the published image from the GitHub Container Registry (built, scanned
and pushed by CI on every main update):
docker run -p 8501:8501 ghcr.io/maxime2476/bmw-sales-analytics:latestManaged deployment (Streamlit Community Cloud or Hugging Face Spaces): see DEPLOYMENT.md.
On Windows + Anaconda,
KMP_DUPLICATE_LIB_OK=TRUEis set in-code to avoid the known OpenMP (libiomp5md.dll) clash when importing PyTorch.
- Typed (PEP 484) and
mypy-clean across thesrc/package. - Formatted & linted:
black,isort,flake8— all clean;pre-commithooks run the same gates locally. - Tested: a
pytestsuite behind a coverage gate of ≥ 62% (live status in the CI and Codecov badges above) — schema, leakage, mock determinism & circuit-breaker fallback, leakage-aware splits, signal audit, predictive capability, Monte-Carlo simulator, SQL layer, report builders; real-data checks markedintegration. A guard test keeps this gate in sync between the README and CI. - Security:
pip-auditdependency scan, Trivy image scan, and Dependabot updates (pip · actions · docker). - SQL analytics: decision queries in
sql/queries/executed by DuckDB directly over the CSV (window functions, quantiles, YoY) —make sql. - Advanced uncertainty & causality: conformal prediction (calibrated,
distribution-free intervals that honestly widen to ~95% of range on the
signal-free data —
make conformal), a causal price→demand analysis via backdoor adjustment under an explicit DAG (make causal), and an optional Claude-powered scenario narrator with a deterministic offline fallback. - Experiment tracking: every benchmarked model is logged to MLflow
(
mlflow ui --backend-store-uri ./mlruns). - Docs site: MkDocs Material (ADRs + auto API reference) auto-deployed to GitHub Pages.
- CI/CD: GitHub Actions — lint + type + test matrix (3.11/3.12) with a coverage gate → cached Docker build + Trivy scan. See ADR-0005, ADR-0007.
- This dataset cannot price or forecast. Any model claiming high accuracy on it is either leaking the target or overfitting noise — a useful red-flag heuristic for reviewing vendor models.
- Pricing & go-to-market must lean on external signals (regional income, fuel economics, CO₂ regulation) — exactly what the Simulator operationalises.
- The electrification transition is the real story: regulation stringency, not historical volume, should drive the Petrol→Electric portfolio mix.
| ADR | Decision |
|---|---|
| 0001 | Architecture & stack |
| 0002 | Data-integrity finding & honest-modelling strategy |
| 0003 | Hybrid external-data augmentation |
| 0004 | DL tested, not assumed |
| 0005 | Containerisation & CI/CD |
| 0006 | Statistical signal audit & positive control |
| 0007 | SQL analytics & hardened quality gates |
| 0008 | Decision-making under uncertainty (Monte-Carlo) |
| 0009 | Experiment tracking & published docs site |
Maxime GOURGUECHON — maxime.gourguechon76@gmail.com








