Skip to content

Error raised when calculating overall weights under some parameterizations of the dataset #6

@btleyden

Description

@btleyden

First, thanks for this great package, I've been looking forward to it. I appreciate all the effort that goes into making something like this.

I've been running into an error when I run cont_did() for cases where the number of groups < number of periods (although I now see that this issues applies more generally, see the df3 case below). For example, when I have many periods with one ever-treated group and one never-treated group I often get the error Error in overall_weights(att_gt, ...) : something's going wrong calculating overall weights.

I think I've tracked down the issue, although my comparative advantage is not in R so it's possible I'm wrong here. I've provided a minimum working example below, and then provide some code that I've used to (possibly?) diagnose the issue.

The error is thrown at the end of overall_weights() in pte_aggte.R when sum(out_weight) != 1. For the examples I've run, the sum of the weights should be equal to 1, but the condition evaluates to TRUE because of a floating point precision issue.

(Also, let me know if I should move this report to the ptetools repo. I wasn't entirely sure where to put it.)

First, here's a quick MWE that demonstrates the issue.

library("contdid")
set.seed(117)

# Dataset from contdid README
df1 = simulate_contdid_data(
  n = 5000,
  num_time_periods = 4,
  num_groups = 4,
  dose_linear_effect = 0,
  dose_quadratic_effect = 0
)

# Dataset where ever-treated units are all treated mid-way through the sample
df2 = simulate_contdid_data(
  n = 5000,
  num_time_periods = 10,
  num_groups = 10,
  dose_linear_effect = 0,
  dose_quadratic_effect = 0
)
df2$G = ifelse(
	df2$G != 0,
	5,
	0
)

# Dataset similar to README example, with contrived number of groups/periods
df3 = simulate_contdid_data(
  n = 5000,
  num_time_periods = 11,
  num_groups = 11,
  dose_linear_effect = 0,
  dose_quadratic_effect = 0
)

# Run cont_did for each dataset
r1 = cont_did(
  yname = "Y",
  tname = "time_period",
  idname = "id",
  dname = "D",
  data = df1,
  gname = "G",
  target_parameter = "slope",
  aggregation = "dose",
  treatment_type = "continuous",
  control_group = "nevertreated",
  biters = 100,
  cband = TRUE,
  num_knots = 0,
  degree = 1,
)

r2 = cont_did(
  yname = "Y",
  tname = "time_period",
  idname = "id",
  dname = "D",
  data = df2,
  gname = "G",
  target_parameter = "slope",
  aggregation = "dose",
  treatment_type = "continuous",
  control_group = "nevertreated",
  biters = 100,
  cband = TRUE,
  num_knots = 0,
  degree = 1,
)

r3 = cont_did(
  yname = "Y",
  tname = "time_period",
  idname = "id",
  dname = "D",
  data = df3,
  gname = "G",
  target_parameter = "slope",
  aggregation = "dose",
  treatment_type = "continuous",
  control_group = "nevertreated",
  biters = 100,
  cband = TRUE,
  num_knots = 0,
  degree = 1,
)

Model r1 runs properly (modulo a warning about uniform vs. pointwise CIs), r2 and r3 give the error:

Error in overall_weights(att_gt, ...) :
  something's going wrong calculating overall weights

Note that these examples also work for non-linear models.

Here's some code/directions that will document the precision issue:

library("contdid")
set.seed(117)

# Set breakpoint
options(error = browser)
debugonce(ptetools:::overall_weights)

# Dataset where ever-treated units are all treated mid-way through the sample
df2 = simulate_contdid_data(
  n = 5000,
  num_time_periods = 10,
  num_groups = 10,
  dose_linear_effect = 0,
  dose_quadratic_effect = 0
)
df2$G = ifelse(
	df2$G != 0,
	5,
	0
)

# Run model
r2 = cont_did(
  yname = "Y",
  tname = "time_period",
  idname = "id",
  dname = "D",
  data = df2,
  gname = "G",
  target_parameter = "slope",
  aggregation = "dose",
  treatment_type = "continuous",
  control_group = "nevertreated",
  biters = 100,
  cband = TRUE,
  num_knots = 1,
  degree = 3,
)

# Step through function with "n" until out_weight has been created.
print(out_weight)  # Output: Three 0s, and 6 approximations of 1/6; should sum to 1
print(sum(out_weight))  # Output: 1
print(sum(out_weight) - 1)  # Output: Small, non-zero value

# Run the "quick sanity check" from overall_weights()
if (sum(out_weight) != 1) stop("something's going wrong calculating overall weights")
# Output: stop("something's going wrong calculating overall weights")

For what it's worth, it's easy to make and break examples of this. E.g., change the 5 in df2 to 6 and it's resolved.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions