Interactive Dash app for comparing causal measurement approaches on a real randomised marketing experiment (Hillstrom, 2008).
Live demo: Hugging Face Space
This project provides a dashboard to:
- estimate average treatment effects with uncertainty
- inspect heterogeneity and targeting value
- show where each method agrees and disagrees, and what assumptions drive the result
Teams might ask two different questions:
- "Did the campaign work on average?" (causal effect / ATE)
- "Who should we target next?" (HTE / uplift policy)
This dashboard puts both views side-by-side so the methodological choices and any business implications are easily comparable.
| Tab | Method | Role in this project |
|---|---|---|
| 2 | PSM sensitivity (propensity matching + caliper) | Observational-style diagnostic; matched ATT on pruned cohort vs control |
| 3 | Bayesian A/B (PyMC hurdle model) | Probabilistic effect estimation with posterior uncertainty |
| 4 | Uplift / HTE (T-Learner, S-Learner) | Ranking customers by estimated incremental value |
| 5 | Multi-Arm OLS with interactions | Precision-adjusted average effects and subgroup patterns |
| 6 | Method Comparison | Side-by-side estimate reconciliation and takeaway |
Source: MineThatData Email Analytics (Hillstrom)
Randomised experiment across ~64,000 customers:
- Men's, Women's and Control (split roughly equal-sized three ways)
- Primary outcome: 2-week post-campaign spend (USD)
- Key covariates: recency, history, mens/womens indicators, zip code, newbie, channel
Dependencies are managed with uv (pyproject.toml + uv.lock).
- uv installed
- Python 3.13+
From the project root:
uv syncThis creates .venv (if needed) and installs the locked dependency set.
uv run python app.pyOpen http://localhost:8050.
- First run precomputes models and caches results in
.cache/results.pkl. - Subsequent runs load from cache and start quickly.
- Depending on machine speed, initial build can take several minutes.
- Delete
.cache/results.pkl, or setUSE_CACHE = Falseincausal_utils.py. - Restart the app once to rebuild the cache.
- Set
USE_CACHEback toTrueafter a deliberate rebuild (optional; deleting the pickle has the same effect ifUSE_CACHEstaysTrue).
Live Space: huggingface.co/spaces/jordancheney89/causality
This repo includes a Dockerfile configured for the Docker Spaces SDK
- If you change estimation logic in
causal_utils.py, delete.cache/results.pklor useUSE_CACHE = False, then rerun the app once. uv.lockpins transitive versions; runuv lockafter changing dependencies inpyproject.toml.
The live app is deployed on Hugging Face Spaces using Docker.
Precomputed model outputs are stored in .cache/results.pkl so the app can start quickly without rerunning the Bayesian, PSM and uplift models on every container start.
Hugging Face Spaces reads YAML front matter at the very top of README.md. This repository keeps that metadata only on branch hf-space, while main (GitHub) uses a normal README without front matter.
Push targets
- GitHub:
git push origin main - Space (updates the Space repoβs
main):git push space hf-space:main
After you change code on main, refresh the Space branch
git checkout hf-space
git merge mainIf the merge removes or conflicts on the README header, put the Space YAML back as the first lines of README.md (then save, commit on hf-space). Use this block:
---
title: Causal Inference Dashboard
emoji: π
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
license: mit
---Then push: git push space hf-space:main (and optionally git push origin hf-space to back the branch up on GitHub).
- The underlying dataset is randomized, so causal identification of average effects comes from random assignment.
- Covariate-adjusted and matched analyses are included as precision, sensitivity, and interpretability tools.
- Uplift metrics are useful for ranking policy decisions but should ideally be reported with uncertainty intervals when used for high-stakes targeting.
- Subgroup interaction findings are exploratory unless multiplicity is explicitly controlled.
- Agreement and disagreement across methods for Mens vs Control and Womens vs Control.
- Posterior probability and HDI width in Bayesian A/B (effect magnitude + uncertainty).
- Whether uplift curves and decile lift indicate actionable ranking value beyond random targeting.
- Consistency between OLS interaction patterns and uplift heterogeneity signals.
.
βββ Dockerfile # Hugging Face Spaces (Docker SDK); gunicorn on port 7860
βββ .dockerignore # Smaller build context (excludes .venv, caches of dev tools)
βββ app.py # Thin entrypoint: Dash app, theme registration, layout, callback wiring
βββ causal_utils.py # Data prep, caching, and all causal estimation logic
βββ dashboard/
β βββ theme.py # Design tokens, Plotly template, shared style dicts
β βββ data.py # Loads cache β exposes RESULTS, DF, PSM, BAYESIAN, UPLIFT, OLS
βββ layouts/
β βββ shell.py # Navbar + tab container; imports per-tab layouts
β βββ components.py # Reusable UI helpers (KPI cards, section headers, methodology collapse)
β βββ overview.py # Tab 1 layout
β βββ psm.py # Tab 2 layout
β βββ bayesian.py # Tab 3 layout
β βββ uplift.py # Tab 4 layout
β βββ ols.py # Tab 5 layout
β βββ comparison.py # Tab 6 layout
βββ callbacks/
β βββ __init__.py # register_callbacks(app)
β βββ overview.py # Tab 1 callbacks
β βββ psm.py # Tab 2 callbacks
β βββ bayesian.py # Tab 3 callbacks
β βββ uplift.py # Tab 4 callbacks
β βββ ols.py # Tab 5 callbacks
β βββ comparison.py # Tab 6 callbacks
βββ figures/
β βββ overview.py # Static Plotly helpers for Overview tab
βββ content/
β βββ methodology.py # Long-form copy separated from layout code
βββ assets/
β βββ style.css # Global styles (Dash serves /assets automatically)
βββ .cache/ # Precomputed outputs (e.g. results.pkl)
βββ pyproject.toml
βββ uv.lock
βββ .python-version
βββ README.md
- Add data ingestion wizard
MIT. See LICENSE.