I'm trying to run a simple distributed poisson GLMM using the DPQL model. However, at initialization my bhat vector always contains NA and if I'm understanding the output correctly it is also too long. I tried to look a bit into the pda() source code and it seems that we end up with 2 Intercept terms somehow, resulting in the NA estimate due to collinearity.
I'm not sure if this is a bug or if I specified the model incorrectly.
# ############################ STEP 0: Generate testdata ###############################
set.seed(42)
n <- 3 * 6 * 20 # 3 sites, 6 wards, 20 obs each
d <- data.frame(
site = rep(paste0("site", 1:3), each = 6 * 20),
ward = rep(1:18, each = 20),
treatment = rbinom(n, 1, 0.5)
)
# Random intercepts per ward
ward_re <- rnorm(18, 0, 0.5)
lambda <- exp(0.3 + 0.4 * d$treatment + ward_re[d$ward])
d$y <- rpois(n, lambda)
ipdata_site1 <- d[d$site=="site1",]
ipdata_site2 <- d[d$site=="site2",]
ipdata_site3 <- d[d$site=="site3",]
# ############################ STEP 1: initialize ###############################
#pdaCatalog()
control <- list(project_name = 'glmm possion',
step = 'initialize',
sites = unique(d$site),
heterogeneity = TRUE,
variables_heterogeneity = "Intercept",
model = 'DPQL',
family = 'poisson',
outcome = "y",
variables = c('treatment'),
optim_maxit = 100,
lead_site = 'site1',
init_method = 'lead',
optim_method = "BFGS",
upload_date = as.character(Sys.time()),
maxround = 3)
#write control file
pda(site_id = 'site1', control = control, dir = getwd(),upload_without_confirm = T)
#initialize
pda(site_id = 'site3', ipdata = ipdata_site3, dir=getwd())
#check
config <- getCloudConfig(site_id = 'site3', dir=getwd())
site3_initial <- pdaGet(name = 'site3_initialize', config = config)
#this should have length 2 right?
site3_initial$bhat_i
I'm trying to run a simple distributed poisson GLMM using the DPQL model. However, at initialization my bhat vector always contains
NAand if I'm understanding the output correctly it is also too long. I tried to look a bit into the pda() source code and it seems that we end up with 2 Intercept terms somehow, resulting in the NA estimate due to collinearity.I'm not sure if this is a bug or if I specified the model incorrectly.
Here's the minimal working example: