Skip to content

Fix splines (add followup.spline.df option), rename format.time(), and use roxygen2 8.0.0#153

Merged
remlapmot merged 9 commits intomainfrom
devel-2026-05-01
May 5, 2026
Merged

Fix splines (add followup.spline.df option), rename format.time(), and use roxygen2 8.0.0#153
remlapmot merged 9 commits intomainfrom
devel-2026-05-01

Conversation

@remlapmot
Copy link
Copy Markdown
Collaborator

Apologies that the diff for this is large - this is because of fixing the splines. We only found the splines weren't working because @pmadleydowd has been trying to apply them to his real dataset and we couldn't make the cum inc curves change shape - currently there's no df specified so I don't think they're doing anything nonlinear, so have added an option to control the df.

  • Fix followup.spline = TRUE so the basis is genuinely non-linear. Splines are now built into the model formula via splines::ns() instead of being applied as a single-column transform of followup, and the new followup.spline.df option (default 4) controls the number of basis functions. The treatment-by-followup interaction now uses the same spline basis. Knots are baked from the full expanded followup once at fit time so the basis is identical at fit and prediction time across bootstraps and survival grids. Internally, formula column extraction now uses all.vars(), so user-supplied covariates may include ns(), bs(), I(), factor(), poly() etc. without breaking expansion.
  • Rename format.time() to format_time() because it wasn't an S3 method and hence was causing roxygen2 to write incorrect information in its helpfile.
  • Add package level helpfile and bump roxygen2 to 8.0.0 (which has fixed a link in the SEQOpts helpfile - as it does a better job with links).
  • Bump JamesIves/github-pages-deploy-action to its v4 sliding tag (currently resolves to the slightly newer 4.8.0)

remlapmot added 8 commits May 1, 2026 11:25
Previously `followup.spline = TRUE` called `ns(followup)` with no `df`, which returns a single-column basis equivalent to a linear term — the spline did almost nothing. The transform was also applied as a single data.table column assignment, so passing any `df > 1` failed with a recycling error because `ns()` returns a multi-column matrix.

Splines are now built into the model formula via `splines::ns()` rather than as a pre-fit data transform. A new `followup.spline.df` option (default 4 → 3 interior knots) controls the number of basis functions, and the treatment-by-followup interaction now uses the same spline basis so the treatment effect can flex non-linearly over time. Knots are computed once from the full expanded `followup` column and baked into the formula as explicit `knots = c(...), Boundary.knots = c(...)` arguments, so the basis is identical at fit and prediction time across bootstraps and the survival-curve grid (otherwise `ns()` would silently rebuild knots from each subset).

Three call-sites that extracted column names from formula strings via `strsplit("\\+|\\*|\\:")` (`SEQexpand`, `init_formula_cache::parse_covs`, `inline.pred` fallback) now use a new `formula_vars()` helper backed by `all.vars(as.formula(...))`. Function-wrapped terms like `ns()`, `bs()`, `I()`, `factor()`, `poly()` resolve to their underlying variables instead of being treated as raw column names, so user-supplied covariates may include them without breaking expansion — this is also what unblocks the spline-in-formula approach.

New tests cover `formula_vars` extraction, basis column count for default and custom `df`, baked-knot invariance under row subsetting (the train/predict guarantee), user-supplied `ns()` covariates surviving expansion end-to-end, and `followup.spline.df` validation.
@remlapmot remlapmot requested a review from ryan-odea May 5, 2026 08:42
@remlapmot remlapmot force-pushed the devel-2026-05-01 branch from 3e328d2 to 25b72df Compare May 5, 2026 10:56
Copy link
Copy Markdown
Collaborator

@ryan-odea ryan-odea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! We may also want to test outcome coefficients with splines are comparable to the python version or near the same. I don't think I saw any coefficient testing in the tests you've added.

@remlapmot
Copy link
Copy Markdown
Collaborator Author

Good point.

Will leave for later (for example comparing splines between R and Stata can be maddening because Stata uses some weird algorithm that's difficult to recreate in R - but can't remember how that affects coefficients [hopefully not much])

@remlapmot remlapmot merged commit 46f7e9e into main May 5, 2026
7 checks passed
@remlapmot remlapmot deleted the devel-2026-05-01 branch May 5, 2026 12:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants