TestMechs

The TestMechs package implements the methodology from the paper “Testing Mechanisms” by Soonwoo Kwon and Jonathan Roth. The package provides tests for the “sharp null of full mediation”, which conjectures that the effect of a treatment operates through a particular conjectured mechanism (or set of mechanisms) M. It also provides lower bounds on the fraction of “always-takers” who are affected by the treatment despite having the same value of M regardless of treatment status. All core functions in the package — test_sharp_null(), lb_frac_affected(), and partial_density_plot() — support conditional random assignment and accept a reg_formula argument. When you provide a formula, the package regression-adjusts the relevant partial probabilities using OLS or IV. Note that the approach in the paper requires the mediator $M$ to be discrete. If reg_formula is omitted, the functions assume that the treatment is as good as randomly assigned (as in an RCT).

Installation

You can install the development version of TestMechs from GitHub with:

# Install devtools if not already installed
install.packages("devtools")

# Install package
devtools::install_github("jonathandroth/TestMechs")

Application to Baranov et al. (2020)

We illustrate how the package can be used by walking through how the code can be applied to the application of Baranov et al. (2020) in Section 5.2 of “Testing Mechanisms”. In Baranov et al. (2020), $D$ is a treatment for depression and $Y$ is an index of outcomes for women’s financial empowerment. We are interested in whether the effect of $D$ on $Y$ can be explained by a mediator, or set of mediators, $M$. We consider three choices for $M$: (a) the presence of a grandmother in the home, (b) relationship quality with the woman’s husband, and (c) the combination of these two mechanisms.

Randomly-assigned setting

We start with loading the required packages and the data.

# Load TestMechs
library(TestMechs)

# Load other packages that are required to run the example
library(dplyr)
library(ggplot2)
library(haven)

# Load data
data("baranov_data")

# Restrict to the experimental sample
mother_data <- mother_data %>% filter(THP_sample == 1)

We begin with the case where $M$ is a binary indicator for the presence of a grandmother in the home. The outcome variable $Y$ is continuous, and for ease of transparency and for conducting inference, we discretize the outcome into 5 bins based on the unconditional quantiles of the outcome. As noted in the paper, the test still remains valid under such a discretization but potentially loses sharpness. With some abuse of notation, we from now on write $Y$ to refer to this discretized outcome.

The main functions we will be using are: 1) partial_density_plot() to plot the partial densities to visually detect potential violations of the sharp null, 2) test_sharp_null() to conduct a statistical test for the sharp null and 3) lb_frac_affected() to compute the sharp lower bound for the fraction of always-takers (or never-takers) that are affected by treatment. Each function can be adjusted for conditional random assignment by providing a reg_formula if needed (e.g., adding baseline covariates or instruments). While this example covers the basic usage of these functions, please refer to the documentation of each function (e.g., ?test_sharp_null) for a more detailed description.

Graphical Evidence

We first provide graphical evidence using a partial density plot using the function partial_density_plot(). When $M$ is binary, such (partial) density plots are often helpful to understand where the violations of the sharp null are coming from. The following snippet reproduces Figure 3 of the paper.

nt_plot <-
  partial_density_plot(df = mother_data,
                       d = "treat",
                       m = "grandmother",
                       y = "motherfinancial",
                       num_Ybins = 5,
                       plot_nts = T,
                       density_1_label = "Prob. In Treated Group (P(Y,M=0|D=1))",
                       density_0_label = "Prob. In Control Group (P(Y,M=0|D=0))") +
  ylab("Probability") +
  xlab("Event: No Grandmother Present (M=0) and Y in Stated Range")

nt_plot +
  annotate(geom = "text", 
           x = 5,y=0.15,
           label = "Higher prob. in \n treated group, \n violating sharp null", size = 3) +
  geom_segment(x=4.7,y=0.17,
               xend = 4.5, yend = 0.175,
               arrow = arrow(length = unit(0.03,"npc")),
               color = "black", show.legend = F) +
  
  geom_segment(x=5,y=0.13,
               xend = 5, yend = 0.10,
               arrow = arrow(length = unit(0.03,"npc")),
               color = "black", show.legend = F) +  
  theme(legend.position="bottom",
        legend.title = element_blank())

This figure shows estimates of $P(Y=y,M=0 \mid D=d)$ for both $d=1$ and $d=0$. Under monotonicity, as shown in Section 2 of the paper, we should have that $P(Y=y, M=0 \mid D=1) \leq P(Y=y, M=0 \mid D=0)$ for all values of $y$. In other words, there should be a negative treatment effect on the compound outcome $1[Y=y,M=0]$ (i.e. have outcome $y$ and no grandmother present). As shown in the figure, however, this inequality appears to be violated at large values of $y$, suggesting that the outcome for some treated never-takers improved when receiving the treatment.

The argument plot_nts = T tells the package to make a plot showing the inequalities corresponding to there being no treatment effect for the “never-takers” (i.e. individuals with $M=0$ under both treatments). If instead plot_nts is set to F, we get a similar plot for the always-takers, which checks whether $P(Y=y, M=1 \mid D=1) \geq P(Y=y, M=1 \mid D=0)$. In this example, these inequalities appear to be satisfied, and so we cannot reject that there is no effect of the treatment on the always-takers.

at_plot <-
  partial_density_plot(df = mother_data,
                       d = "treat",
                       m = "grandmother",
                       y = "motherfinancial",
                       num_Ybins = 5,
                       plot_nts = F,
                       density_1_label = "Prob. In Treated Group (P(Y,M=1|D=1))",
                       density_0_label = "Prob. In Control Group (P(Y,M=1|D=0))") +
  ylab("Probability") +
  xlab("Event: Grandmother Present (M=1) and Y in Stated Range")

at_plot  +
  theme(legend.position="bottom",
        legend.title = element_blank())

Testing the sharp null of full mediation

While the figure above hints at a possible violation of the sharp null, it does not come with any uncertainty quantification. The function test_sharp_null() conducts statistical inference of the sharp null using the method described in Section 4 of the paper. The following snippet runs this test based on the test proposed in Cox and Shi (2023), which is our recommended approach for most applications. The package supports using the tests provided by Andrews, Roth, and Pakes (2023) and Fang, Santos, Shaikh, and Torgovitsky (2023); these methods can be specified by changing the method argument from "CS" to "ARP" or "FSST". When M is binary, as in our example here, one can also use the test from Kitagawa (2015) by setting method = "toru".

test_result <- test_sharp_null(df = mother_data,
                               d = "treat",
                               m = "grandmother",
                               y = "motherfinancial",
                               method = "CS", #use Cox and Shi test
                               num_Ybins = 5, #discretize using 5 bins
                               cluster = "uc") #cluster SEs at uc level
#> Loading required package: lpinfer

test_result$pval
#>            [,1]
#> [1,] 0.02283916

The test gives a p-value of 0.023, and thus the sharp null is rejected at the 5% significance level. Here, the p-value corresponds to the smallest value of $\alpha$ for which the test rejects. As mentioned above, we discretize the outcome variable to 5 bins as can be seen from the argument num_Ybins = 5. Currently, the function discretizes $Y$ into 5 bins if a num_Ybins value is not provided but $Y$ takes more than $30$ distinct values in the data. Because all of the methods rely on a central limit theorem approximation, one should choose the number of bins small enough such that the central limit theorem is reasonable within cells defined by the combination of $(Y,M,D)$.

Calculating the lower bound on fraction of always-takers affected by outcome

The test above suggests that the treatment effect does not operate entirely through the presence of a grandmother in the home. There are some people (never-takers) whose outcome is affected by the treatment despite having no change in $M$. It must be that some other mechanism mattered for these people. But how prevelant are these alternative mechanisms?To give a sense, we now compute lower bounds on the fraction of never-takers whose outcome is affected by the treatment despite having the same value of $M$ under both treatments. This gives a sense of the strength of mechanisms other than $M$: it tells us what fraction of the never-takers have a direct effect of the treatment. The function lb_frac_affected computes a point estimate of this lower bound. The argument at_group = 0 corresponds to computing this lower bound for the never-takers, who are referred to as “0-always takers” in the more general notation in the paper.

lb_nts <- lb_frac_affected(df = mother_data,
                           d = "treat",
                           m = "grandmother",
                           y = "motherfinancial",
                           num_Ybins = 5,
                           at_group = 0)
lb_nts
#> [1] 0.1858912

Our estimates of the lower bound imply that at least 19 percent of never-takers are affected by the treatment. One could likewise test the fraction of never-takers affected by setting at_group = 1 (in this case, the lower bound is zero). If at_group is set to NULL, then the package calculates the fraction pooling across all types that have the same value of $M$ under both treatments (i.e. always-takers and never-takers when $M$ is binary.)

Allowing for defiers

By default, TestMechs imposes the monotonicity assumption that the treatment can only increase the value of $M$. In this setting, this means that everyone who would have a grandmother present without receiving CBT treatment would also have one present when receiving CBT treatment. We can relax this assumption by setting the max_defiers_share parameter to be non-zero, which bounds the number of “defiers” by max_defiers_share.

We rerun the test above with max_defiers_share = .01, which allows one percent of the population to be defiers.

test_result_defiers <- test_sharp_null(df = mother_data,
                                       d = "treat",
                                       m = "grandmother",
                                       y = "motherfinancial",
                                       method = "CS",
                                       num_Ybins = 5,
                                       cluster = "uc",
                                       max_defiers_share = .01)

test_result_defiers$pval
#>            [,1]
#> [1,] 0.04630939

The p-value increases to 0.046, so the test rejects the sharp null even if you allow one percent of the population to be defiers. (Allowing for larger shares of defiers will eventually lead to an insignificant result.)

Likewise, we can also calculate the lower bound on the fraction of never-takers under this relaxed monotonicity.

lb_nts_defiers <- lb_frac_affected(df = mother_data,
                                   d = "treat",
                                   m = "grandmother",
                                   y = "motherfinancial",
                                   num_Ybins = 5,
                                   at_group = 0,
                                   max_defiers_share = .01)
lb_nts_defiers
#> [1] 0.1716415

Our estimates of the lower bound imply that at least 17 percent of never-takers are affected by the treatment when we allow one percent of the population to be defiers.

Results for an alternative mechanism (relationship quality with husband)

We next turn to the setting where we are interested in testing whether the effect is mediated by relationship quality with the husband, which is measured on a 1-5 scale. We can again test the sharp null and estimate a lower bound on the fraction affected.

test_sharp_null(df = mother_data,
                d = "treat",
                m = "relationship_husb",
                y = "motherfinancial",
                num_Ybins = 5,
                method = "CS",
                cluster = "uc")$pval
#>            [,1]
#> [1,] 0.02838332

Again, we reject the sharp null that all the treatment effect goes through the relationship quality with the husband.

We can also estimate a lower bound on the fraction of always-takers:

lb_frac_affected(df = mother_data,
                 d = "treat",
                 m = "relationship_husb",
                 y = "motherfinancial",
                 num_Ybins = 5,
                 at_group = NULL,
                 allow_min_defiers = TRUE)
#> [1] 0.1002207

Here, the parameter at_group = NULL asks the function compute the lower bound on the fraction of always-takers, pooled across different $M$ values, which it calculates to be around 10 percent. (The empirical distribution suggests a small violation of monotonicity, although it is not statistically significant; the argument allow_min_defiers = TRUE calculates the lower bound allowing for the minimum number of defiers consistent with the empirical distribution — see footnote 25 of the paper for details.)

We note that while the method works with multi-valued discrete mediators (such as our 1-5 score), we generally expect the power of the test to decrease as one approaches an approximately continuous mediator. See the discussion in Remarks 2 and 3 of the paper regarding power and discretization of $M$.

Combination of both mechanisms

Next, we test the null hypothesis that the treatment effect is explained by the combination of the two mechanisms. This is done by passing a vector of variables names for the m argument.

test_result_both <- test_sharp_null(df = mother_data,
                                    d = "treat",
                                    m = c("relationship_husb",
                                          "grandmother"),
                                    y = "motherfinancial",
                                    num_Ybins = 5,
                                    method = "CS",
                                    cluster = "uc")

test_result_both$pval
#>           [,1]
#> [1,] 0.6540863

With a p-value of 0.654, we cannot reject the sharp null that the combination of presence of grandmother and relationship quality with husband fully explain the treatment effect.

Again, we can estimate a lower bound on the fraction of those affected by treatment, pooled across different $M$ values:

lb_frac_both <- lb_frac_affected(df = mother_data,
                                 d = "treat",
                                 m = c("relationship_husb",
                                       "grandmother"),
                                 y = "motherfinancial",
                                 num_Ybins = 5,
                                 allow_min_defiers = TRUE)
lb_frac_both
#> [1] 0.07251284

We estimate a lower bound of 7 percent, although this does not appear to be statistically significant given the test result above.

Non-experimental setting

The examples above focus on data from a randomized controlled trial, where treatment $D$ is randomly assigned. TestMech assumes by default that we have an RCT, and treatment effects are estimated by comparing means for the treated and control group. However, TestMechs can also be applied in settings where we have conditional randomization given covariates or an instrumental variable for the treatment. Specifically, the core functions of the package allow for the reg_formula argument, which allows the researcher to provide a regression formula (or IV formula) to estimate treatment effects after adjusting linearly for observable characteristics. The reg_formula argument is passed to the fixest package to estimate treatment effects, and thus accommodates fixest functionality, including IV and high-dimensional fixed effects. While adjusting for covariates is not necessary in our running example, which is an RCT, we can still adjust for covariates to increase precision. Below we show how this is done using the baseline covariates age_baseline, edu_mo_baseline, and wealth_baseline. For brevity, we focus on testing the sharp null using covariates using the function test_sharp_null(), although the reg_formula result can analogously be passed to the functions partial_density_plot() and lb_frac_affected().

test_result_gm_ols <- test_sharp_null(df = mother_data,
                                       d = "treat",
                                       m = "grandmother",
                                       y = "motherfinancial",
                                       reg_formula = "~ treat + age_baseline + edu_mo_baseline + wealth_baseline",   
                                       method = "CS",
                                       num_Ybins = 5,
                                       cluster = "uc")
#> Loading required package: lpinfer
test_result_gm_ols$pval
#>            [,1]
#> [1,] 0.03105106

The key new argument is reg_formula = "~ treat + age_baseline + edu_mo_baseline + wealth_baseline". This says that we should estimate the effect of the treatment (treat) on $Y$ and $M$ (or functions thereof) by running a regression with the treatment variable and baseline controls on the right-hand side.

We can also use reg_formula to estimate treatment effects using instrumental variables. To illustrate how this works, we create an instrumental variable iv equal to the true treatment variable plus noise. We now modify the reg_formula argument to use iv as an instrument for treat.

set.seed(0)
mother_data$iv <- mother_data$treat + rnorm(n = length(mother_data$treat), sd = 0.1) #iv = treat + noise

test_result_gm_iv <- test_sharp_null(df = mother_data,
                                       d = "treat",
                                       m = "grandmother",
                                       y = "motherfinancial",
                                       reg_formula = "~ age_baseline + edu_mo_baseline + wealth_baseline | treat ~ iv", #iv spec
                                       method = "CS",
                                       num_Ybins = 5,
                                       cluster = "uc")

test_result_gm_iv$pval
#>           [,1]
#> [1,] 0.0311524

Name		Name	Last commit message	Last commit date
Latest commit History 269 Commits
R		R
README_cache/gfm		README_cache/gfm
data		data
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TestMechs

Installation

Application to Baranov et al. (2020)

Randomly-assigned setting

Graphical Evidence

Testing the sharp null of full mediation

Calculating the lower bound on fraction of always-takers affected by outcome

Allowing for defiers

Results for an alternative mechanism (relationship quality with husband)

Combination of both mechanisms

Non-experimental setting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TestMechs

Installation

Application to Baranov et al. (2020)

Randomly-assigned setting

Graphical Evidence

Testing the sharp null of full mediation

Calculating the lower bound on fraction of always-takers affected by outcome

Allowing for defiers

Results for an alternative mechanism (relationship quality with husband)

Combination of both mechanisms

Non-experimental setting

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages