[WIP] Add efficient DiD estimator (Chen, Sant'Anna & Xie 2025)#260
Draft
marcelortizv wants to merge 12 commits into
Draft
[WIP] Add efficient DiD estimator (Chen, Sant'Anna & Xie 2025)#260marcelortizv wants to merge 12 commits into
marcelortizv wants to merge 12 commits into
Conversation
Adds 14 new R source files implementing the Chen, Sant'Anna & Xie (2025) Efficient DiD estimator for staggered-adoption balanced panels. Key features: - PT-All and PT-Post regimes via enumerate_valid_pairs_edid() - Omega* covariance matrix and optimal inverse-covariance weights - Cluster-robust SE via EIF sandwich formula - Multiplier bootstrap (Rademacher, Mammen, Webb) - WIF correction in overall/event-study/group aggregations - edid_fit S3 class with print/summary/coef/vcov/as.data.frame methods - Covariate and survey paths are clean stubs (stop with message) - No new package dependencies (svd() replaces MASS::ginv) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…p field names) - BUG-1: remove spurious `eif - att_gt` subtraction in compute_eif_nocov_edid(); the score is already zero-mean by construction (each group contribution is demeaned) - BUG-2: safe_inference_edid() now returns inference_valid=FALSE (with non-NA se but NA CIs/p-value) when att is non-finite, fixing the valid=TRUE/NA-CI inconsistency - BUG-3: resolves automatically from BUG-1 fix - BUG-4: rename overall_draws→overall_b, event_study_draws→event_study_b, group_draws→group_b in run_multiplier_bootstrap_edid() and update callers in edid.R Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Appended edid() feature bullet to NEWS.md under the 2.3.1.904 development header. Produced ARCHITECTURE.md and log-entry.md in the run directory documenting the EDiD module structure, panel_obj/edid_fit schemas, (g,t) cell loop, PT-All vs PT-Post pair enumeration, Omega* construction, aggregation/bootstrap flows, and 4 bug resolutions (BUG-1 through BUG-4). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the efficient DiD estimator from Chen, Sant'Anna & Xie (2025) supporting PT-All and PT-Post parallel trends assumptions. Features (Priorities 1-6): - Data validation and balanced-panel preprocessing - Valid-pair enumeration for PT-All and PT-Post regimes - Closed-form efficient DiD via inverse-covariance weights (no-cov path) - Analytical EIF-based SEs (iid and cluster-robust) - Overall, event-study, and group aggregation with WIF correction - Multiplier bootstrap (Rademacher, Mammen, Webb) with cluster expansion Deferred to follow-up: DR covariate path, survey support, Hausman pretest. 207 new edid tests pass; 777 full-suite pass; 0 regressions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Rename edid() args: outcome->yname, unit->idname, time->tname, first_treat->gname, alpha->alp, cluster->clustervars, n_bootstrap->bstrap+biters; control_group "never_treated"->"nevertreated" - Add G=0->Inf auto-conversion so att_gt() datasets work directly - Rewrite print.edid_fit() to MP-style ATT(g,t) table with sig codes - summary.edid_fit() delegates to print then appends overall/ES/group - Add bstrap field to edid_fit object for CI label selection - Create R/edid-aggte.R: aggte_edid(), print/summary.AGGTEobj_edid - Update compare_att_gt_edid.R: new arg names, drop G_edid column - Update all edid test files to use renamed arguments Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rison cohort Replaces the PT-All loop in enumerate_valid_pairs_edid() to use only treated cohorts as comparison cohorts (never-treated appears only as the time control inside each moment). Self-pairs (gp==target_g) include period_1 as a valid tpre (degenerate CS DiD); cross-pairs exclude it. This eliminates T-1 redundant gp=Inf rows, resolves near-singular Omega, and produces a correctly-specified analytical Omega for PT-All. Also removes dead gp=Inf branches in compute_omega_star_nocov_edid(), compute_generated_outcomes_nocov_edid(), and compute_eif_nocov_edid() that handled the now-impossible Inf comparison cohort case. PT-Post paths are unchanged. ATT estimates match author's reference to < 1e-10; pair count for cell (g=3, t=any) on 10-period 3-cohort data is 13 (was: 10 finite + 9 redundant Inf). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace edid-covariates.R stub (stop() functions) with empty placeholder - Add xformla argument to edid(), validate_edid_inputs(), prepare_edid_panel(), fit_edid_cells() - Extract covariate_matrix in prepare_edid_panel() from xformla formula - Add xformla/covariates validation in edid-validate.R (covariates now deprecated-errors) - Dispatch to covariate EIF path (edid-cov.R + edid-cov-eif.R) when xformla is non-trivial - Fix bs_objects NULL-sentinel bug in build_basis_matrix_edid/predict_basis_edid that broke cross-fitting Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pass Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Untrack personal working files (PDFs, audit/spec/plan markdown notes) and move compare_att_gt_edid.R into benchmark/. Add the personal files to .gitignore so they remain on disk locally but aren't shared, and to .Rbuildignore (along with benchmark/) so they don't ship in the R package build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The R CMD check failed because:
- Roxygen plain-text math like (g', t_pre), [Y_s - Y_1 | G=g', X], and
r_{g,Inf} confused the Rd parser (apostrophes interpreted as quoted
strings, brackets as link targets, _{...} as markdown emphasis). Wrap
these in \eqn{} or rename g' -> gp to match the variable name in code.
- as.data.frame.edid_fit() signature didn't match the as.data.frame()
generic; add row.names/optional and move which after ... .
Also regenerate stale Rd files for the covariate-path functions.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Status: Work in progress — opening as draft for early feedback
This branch adds an
edid()function implementing the efficientdifference-in-differences estimator from Chen, Sant'Anna & Xie (2025).
I'm sharing now for directional feedback before polishing for review.
What's implemented
edid()user-facing entry point with the same signature pattern asatt_gt()(
yname,idname,tname,gname,xformla,control_group,anticipation, etc.)moments under PT-All / PT-Post / PT-Pre assumptions
estimation of propensity ratios and conditional means, K-fold cross-fitting
derived from EIF plug-in (
sqrt(sum(eif^2)/n^2))simple,group,dynamic,calendar(mirrorsaggte())print(),summary(), andglance()methodsTests
tests/testthat/adds:test-edid-nocov.R,test-edid-cov-basic.R,test-edid-cov-validation.R,test-edid-cov-formula.R,test-edid-cov-eif.R,test-edid-cov-variance.Rtest-edid-pairs.R,test-edid-pairs-validation.R,test-edid-aggregate.R,test-edid-bootstrap.R,test-edid-inference.R,test-edid-integration.R,test-edid-validate.RCurrently FAIL 0 across the suite.
Benchmarks
benchmark/contains:edid_cov_sim.R— Monte Carlo for bias / SE / coverage of the covariate pathedid_sim_original.R— no-cov MCcompare_author_vs_edid.R,compare_att_gt_edid.R— head-to-head againstatt_gt()generate_diffdiff_benchmark.py— Python reference (Sant'Anna'sdifferences)cached to
benchmark/data/*.csvStill to do before this is review-ready
?edidexamplesatt_gt()on the canonical examplesedid-*.Rsplit is rough