Skip to content

[WIP] Add efficient DiD estimator (Chen, Sant'Anna & Xie 2025)#260

Draft
marcelortizv wants to merge 12 commits into
bcallaway11:masterfrom
marcelortizv:feature-efficient-DiD-estimator
Draft

[WIP] Add efficient DiD estimator (Chen, Sant'Anna & Xie 2025)#260
marcelortizv wants to merge 12 commits into
bcallaway11:masterfrom
marcelortizv:feature-efficient-DiD-estimator

Conversation

@marcelortizv
Copy link
Copy Markdown
Contributor

Status: Work in progress — opening as draft for early feedback

This branch adds an edid() function implementing the efficient
difference-in-differences estimator from Chen, Sant'Anna & Xie (2025).
I'm sharing now for directional feedback before polishing for review.

What's implemented

  • edid() user-facing entry point with the same signature pattern as att_gt()
    (yname, idname, tname, gname, xformla, control_group,
    anticipation, etc.)
  • No-covariate path: closed-form GMM aggregation of identifying DiD
    moments under PT-All / PT-Post / PT-Pre assumptions
  • Covariate path: doubly-robust AIPW formulation with sieve (B-spline)
    estimation of propensity ratios and conditional means, K-fold cross-fitting
  • Efficient influence function (EIF) computed for both paths; SEs and CIs
    derived from EIF plug-in (sqrt(sum(eif^2)/n^2))
  • Aggregations: simple, group, dynamic, calendar (mirrors aggte())
  • print(), summary(), and glance() methods

Tests

tests/testthat/ adds:

  • test-edid-nocov.R, test-edid-cov-basic.R, test-edid-cov-validation.R,
    test-edid-cov-formula.R, test-edid-cov-eif.R, test-edid-cov-variance.R
  • test-edid-pairs.R, test-edid-pairs-validation.R, test-edid-aggregate.R,
    test-edid-bootstrap.R, test-edid-inference.R, test-edid-integration.R,
    test-edid-validate.R

Currently FAIL 0 across the suite.

Benchmarks

benchmark/ contains:

  • edid_cov_sim.R — Monte Carlo for bias / SE / coverage of the covariate path
  • edid_sim_original.R — no-cov MC
  • compare_author_vs_edid.R, compare_att_gt_edid.R — head-to-head against att_gt()
  • generate_diffdiff_benchmark.py — Python reference (Sant'Anna's differences)
    cached to benchmark/data/*.csv

Still to do before this is review-ready

  • Vignette / ?edid examples
  • Bootstrap inference path (currently EIF plug-in only)
  • More exhaustive comparison vs. att_gt() on the canonical examples
  • Code organization pass — some of the edid-*.R split is rough
  • Commit history cleanup

marcelortizv and others added 12 commits April 11, 2026 15:19
Adds 14 new R source files implementing the Chen, Sant'Anna & Xie (2025)
Efficient DiD estimator for staggered-adoption balanced panels.

Key features:
- PT-All and PT-Post regimes via enumerate_valid_pairs_edid()
- Omega* covariance matrix and optimal inverse-covariance weights
- Cluster-robust SE via EIF sandwich formula
- Multiplier bootstrap (Rademacher, Mammen, Webb)
- WIF correction in overall/event-study/group aggregations
- edid_fit S3 class with print/summary/coef/vcov/as.data.frame methods
- Covariate and survey paths are clean stubs (stop with message)
- No new package dependencies (svd() replaces MASS::ginv)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…p field names)

- BUG-1: remove spurious `eif - att_gt` subtraction in compute_eif_nocov_edid();
  the score is already zero-mean by construction (each group contribution is demeaned)
- BUG-2: safe_inference_edid() now returns inference_valid=FALSE (with non-NA se
  but NA CIs/p-value) when att is non-finite, fixing the valid=TRUE/NA-CI inconsistency
- BUG-3: resolves automatically from BUG-1 fix
- BUG-4: rename overall_draws→overall_b, event_study_draws→event_study_b,
  group_draws→group_b in run_multiplier_bootstrap_edid() and update callers in edid.R

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Appended edid() feature bullet to NEWS.md under the 2.3.1.904 development
header. Produced ARCHITECTURE.md and log-entry.md in the run directory
documenting the EDiD module structure, panel_obj/edid_fit schemas, (g,t)
cell loop, PT-All vs PT-Post pair enumeration, Omega* construction,
aggregation/bootstrap flows, and 4 bug resolutions (BUG-1 through BUG-4).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the efficient DiD estimator from Chen, Sant'Anna & Xie (2025)
supporting PT-All and PT-Post parallel trends assumptions.

Features (Priorities 1-6):
- Data validation and balanced-panel preprocessing
- Valid-pair enumeration for PT-All and PT-Post regimes
- Closed-form efficient DiD via inverse-covariance weights (no-cov path)
- Analytical EIF-based SEs (iid and cluster-robust)
- Overall, event-study, and group aggregation with WIF correction
- Multiplier bootstrap (Rademacher, Mammen, Webb) with cluster expansion

Deferred to follow-up: DR covariate path, survey support, Hausman pretest.
207 new edid tests pass; 777 full-suite pass; 0 regressions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename edid() args: outcome->yname, unit->idname, time->tname,
  first_treat->gname, alpha->alp, cluster->clustervars,
  n_bootstrap->bstrap+biters; control_group "never_treated"->"nevertreated"
- Add G=0->Inf auto-conversion so att_gt() datasets work directly
- Rewrite print.edid_fit() to MP-style ATT(g,t) table with sig codes
- summary.edid_fit() delegates to print then appends overall/ES/group
- Add bstrap field to edid_fit object for CI label selection
- Create R/edid-aggte.R: aggte_edid(), print/summary.AGGTEobj_edid
- Update compare_att_gt_edid.R: new arg names, drop G_edid column
- Update all edid test files to use renamed arguments

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rison cohort

Replaces the PT-All loop in enumerate_valid_pairs_edid() to use only
treated cohorts as comparison cohorts (never-treated appears only as the
time control inside each moment). Self-pairs (gp==target_g) include
period_1 as a valid tpre (degenerate CS DiD); cross-pairs exclude it.
This eliminates T-1 redundant gp=Inf rows, resolves near-singular Omega,
and produces a correctly-specified analytical Omega for PT-All.

Also removes dead gp=Inf branches in compute_omega_star_nocov_edid(),
compute_generated_outcomes_nocov_edid(), and compute_eif_nocov_edid()
that handled the now-impossible Inf comparison cohort case. PT-Post
paths are unchanged.

ATT estimates match author's reference to < 1e-10; pair count for
cell (g=3, t=any) on 10-period 3-cohort data is 13 (was: 10 finite +
9 redundant Inf).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace edid-covariates.R stub (stop() functions) with empty placeholder
- Add xformla argument to edid(), validate_edid_inputs(), prepare_edid_panel(), fit_edid_cells()
- Extract covariate_matrix in prepare_edid_panel() from xformla formula
- Add xformla/covariates validation in edid-validate.R (covariates now deprecated-errors)
- Dispatch to covariate EIF path (edid-cov.R + edid-cov-eif.R) when xformla is non-trivial
- Fix bs_objects NULL-sentinel bug in build_basis_matrix_edid/predict_basis_edid that broke cross-fitting

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pass

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Untrack personal working files (PDFs, audit/spec/plan markdown notes)
and move compare_att_gt_edid.R into benchmark/. Add the personal files
to .gitignore so they remain on disk locally but aren't shared, and to
.Rbuildignore (along with benchmark/) so they don't ship in the R
package build.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The R CMD check failed because:

- Roxygen plain-text math like (g', t_pre), [Y_s - Y_1 | G=g', X], and
  r_{g,Inf} confused the Rd parser (apostrophes interpreted as quoted
  strings, brackets as link targets, _{...} as markdown emphasis). Wrap
  these in \eqn{} or rename g' -> gp to match the variable name in code.
- as.data.frame.edid_fit() signature didn't match the as.data.frame()
  generic; add row.names/optional and move which after ... .

Also regenerate stale Rd files for the covariate-path functions.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant