Fix splines (add followup.spline.df option), rename format.time(), and use roxygen2 8.0.0#153
Merged
Fix splines (add followup.spline.df option), rename format.time(), and use roxygen2 8.0.0#153
followup.spline.df option), rename format.time(), and use roxygen2 8.0.0#153Conversation
Previously `followup.spline = TRUE` called `ns(followup)` with no `df`, which returns a single-column basis equivalent to a linear term — the spline did almost nothing. The transform was also applied as a single data.table column assignment, so passing any `df > 1` failed with a recycling error because `ns()` returns a multi-column matrix.
Splines are now built into the model formula via `splines::ns()` rather than as a pre-fit data transform. A new `followup.spline.df` option (default 4 → 3 interior knots) controls the number of basis functions, and the treatment-by-followup interaction now uses the same spline basis so the treatment effect can flex non-linearly over time. Knots are computed once from the full expanded `followup` column and baked into the formula as explicit `knots = c(...), Boundary.knots = c(...)` arguments, so the basis is identical at fit and prediction time across bootstraps and the survival-curve grid (otherwise `ns()` would silently rebuild knots from each subset).
Three call-sites that extracted column names from formula strings via `strsplit("\\+|\\*|\\:")` (`SEQexpand`, `init_formula_cache::parse_covs`, `inline.pred` fallback) now use a new `formula_vars()` helper backed by `all.vars(as.formula(...))`. Function-wrapped terms like `ns()`, `bs()`, `I()`, `factor()`, `poly()` resolve to their underlying variables instead of being treated as raw column names, so user-supplied covariates may include them without breaking expansion — this is also what unblocks the spline-in-formula approach.
New tests cover `formula_vars` extraction, basis column count for default and custom `df`, baked-knot invariance under row subsetting (the train/predict guarantee), user-supplied `ns()` covariates surviving expansion end-to-end, and `followup.spline.df` validation.
3e328d2 to
25b72df
Compare
ryan-odea
approved these changes
May 5, 2026
Collaborator
ryan-odea
left a comment
There was a problem hiding this comment.
Looks good! We may also want to test outcome coefficients with splines are comparable to the python version or near the same. I don't think I saw any coefficient testing in the tests you've added.
Collaborator
Author
|
Good point. Will leave for later (for example comparing splines between R and Stata can be maddening because Stata uses some weird algorithm that's difficult to recreate in R - but can't remember how that affects coefficients [hopefully not much]) |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Apologies that the diff for this is large - this is because of fixing the splines. We only found the splines weren't working because @pmadleydowd has been trying to apply them to his real dataset and we couldn't make the cum inc curves change shape - currently there's no df specified so I don't think they're doing anything nonlinear, so have added an option to control the df.
followup.spline = TRUEso the basis is genuinely non-linear. Splines are now built into the model formula viasplines::ns()instead of being applied as a single-column transform offollowup, and the newfollowup.spline.dfoption (default4) controls the number of basis functions. The treatment-by-followup interaction now uses the same spline basis. Knots are baked from the full expandedfollowuponce at fit time so the basis is identical at fit and prediction time across bootstraps and survival grids. Internally, formula column extraction now usesall.vars(), so user-supplied covariates may includens(),bs(),I(),factor(),poly()etc. without breaking expansion.format.time()toformat_time()because it wasn't an S3 method and hence was causing roxygen2 to write incorrect information in its helpfile.