-
Notifications
You must be signed in to change notification settings - Fork 14
Add HERS dataset linear regression example with interactions, interspersed with birthweight analysis (Vittinghoff Ch. 4) #381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Copilot
wants to merge
35
commits into
main
Choose a base branch
from
copilot/add-example-model-hers-dataset
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
557fac2
Initial plan
Copilot 6303987
Add HERS dataset linear regression example following Vittinghoff Chap…
Copilot a841ba7
Pre-compute coefficient values in named variables for maintainability
Copilot e828c2f
Intersperse HERS analysis with birthweight sections; add HERS interac…
Copilot 27418f9
Use explicit data= in geom_line instead of ggplot data replacement op…
Copilot ded1bb6
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison 752d970
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison 71fa75e
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison f7d07a6
Replace BMI×HT interaction with BMI×statins interaction in HERS model
Copilot 90ae68a
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison 4ae72cf
Resolve merge conflicts with main: move HERS includes to _sec_linreg_…
Copilot 3575c3e
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison 8303639
Address PR review: delete unreferenced standalone HERS file and fix W…
Copilot 0bde169
Merge remote-tracking branch 'origin/main' into copilot/add-example-m…
Copilot c46e981
Restructure HERS linear regression examples
Copilot f9d6ec1
Remove duplicate lrtest from _sec_hers_lm_gof.qmd
Copilot a0e362d
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison c8913d8
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison f61a83b
Fix PR review issues: include path, factor levels, diagnostics, key-v…
Copilot 9d7eb27
Fix ggpairs alpha placement and fig-cap quote formatting
Copilot e78dc81
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison 1811f6a
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison 7f8014a
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison 238201b
Add GGally to DESCRIPTION Suggests and renv.lock
Copilot 626ab42
Merge main into branch: resolve WORDLIST and renv.lock conflicts
Copilot b5481a9
Merge main into branch: resolve WORDLIST conflict
Copilot 356b524
Fix GGally::wrap conflict with pander::wrap in fig-hers-key-vars chunk
Copilot e0946ed
Remove HT from HERS models; facet by statins; add stratified regressi…
Copilot d00f257
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison ab62c70
feat: replace HERS scatter with 3D plotly plot and add regress3d surf…
Copilot 848d6d5
fix: address code review - add derivative comparison lines and clarif…
Copilot 105089e
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison 148e28f
fix: restore renv.lock from 626ab42 base (297→298 packages) + add reg…
Copilot c1a964f
Remove multi-column divs from 3D figures; stratify by statin use usin…
Copilot 2a986f2
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -61,6 +61,8 @@ Suggests: | |
| sjPlot, | ||
| equatiomatic, | ||
| broom (>= 1.0.8), | ||
| GGally, | ||
| regress3d, | ||
| lmtest, | ||
| gh, | ||
| lintr, | ||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,139 @@ | ||
| ### Motivating example: `hers` data {.smaller} | ||
|
|
||
| :::{.callout-note} | ||
| This section is based on @vittinghoff2e, Chapter 4. | ||
| ::: | ||
|
|
||
| ::: notes | ||
|
|
||
| {{< include _subfiles/shared/_sec_hers_intro.qmd >}} | ||
|
|
||
| ::: | ||
|
|
||
| ```{r} | ||
| #| eval: false | ||
| #| code-fold: show | ||
| library(haven) | ||
| hers <- haven::read_dta( | ||
| paste0( | ||
| "https://regression.ucsf.edu/sites/g/files", | ||
| "/tkssra6706/f/wysiwyg/home/data/hersdata.dta" | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| ```{r} | ||
| #| include: false | ||
| library(haven) | ||
| hers <- | ||
| fs::path_package("rme", "extdata/hersdata.dta") |> | ||
| read_dta() |> | ||
| dplyr::mutate( | ||
| HT = as_factor(HT) |> | ||
| relevel(ref = "placebo"), | ||
| statins = as_factor(statins) |> | ||
| relevel(ref = "no") | ||
| ) | ||
| ``` | ||
|
|
||
| ::::: {.panel-tabset} | ||
|
|
||
| #### Data as table | ||
|
|
||
| ```{r} | ||
| #| label: tbl-hers-ch4 | ||
| #| tbl-cap: "`hers` data" | ||
| hers |> head() | ||
| ``` | ||
|
|
||
| #### Data as graph | ||
|
|
||
| ```{r} | ||
| #| label: fig-hers-scatter | ||
| #| fig-cap: > | ||
| #| `hers` data (@vittinghoff2e): | ||
| #| age (years) and BMI (kg/m²) vs. baseline LDL (mg/dL), | ||
| #| colored by statin use. | ||
| library(plotly) | ||
| hers_scatter_data <- hers |> | ||
| dplyr::filter(!is.na(age), !is.na(BMI), !is.na(LDL)) | ||
| plotly::plot_ly( | ||
| x = hers_scatter_data[["age"]], | ||
| y = hers_scatter_data[["BMI"]], | ||
| z = hers_scatter_data[["LDL"]], | ||
| color = as.character(hers_scatter_data[["statins"]]), | ||
| colors = c("no" = "steelblue", "yes" = "darkorange"), | ||
| type = "scatter3d", | ||
| mode = "markers", | ||
| marker = list(size = 3, opacity = 0.5) | ||
| ) |> | ||
| plotly::layout( | ||
| scene = list( | ||
| xaxis = list(title = "Age (yr)"), | ||
| yaxis = list(title = "BMI (kg/m²)"), | ||
| zaxis = list(title = "LDL (mg/dL)") | ||
| ), | ||
| legend = list(title = list(text = "Statins")) | ||
| ) | ||
| ``` | ||
|
|
||
| #### Key variables | ||
|
|
||
| ```{r} | ||
| #| label: fig-hers-key-vars | ||
| #| fig-cap: > | ||
| #| Key variables in hers data: outcome (LDL), | ||
| #| treatment (HT), and covariates (BMI, statins, age) | ||
| #| fig-height: 7 | ||
| #| fig-width: 8 | ||
| library(GGally) | ||
| hers |> | ||
| dplyr::select(LDL, HT, BMI, statins, age) |> | ||
| ggpairs( | ||
| mapping = aes(col = statins), | ||
| lower = list(continuous = GGally::wrap("points", alpha = 0.3)), | ||
| columnLabels = c( | ||
| "LDL (mg/dL)", | ||
| "HT", | ||
| "BMI (kg/m²)", | ||
| "Statins", | ||
| "Age (yr)" | ||
| ) | ||
| ) + | ||
| theme_bw() + | ||
| theme(legend.position = "bottom") | ||
| ``` | ||
|
|
||
| ::::: | ||
|
|
||
| --- | ||
|
|
||
| #### Data notation {.smaller} | ||
|
|
||
| ::: notes | ||
| Let's define some notation to represent this data: | ||
| ::: | ||
|
|
||
| - $Y$: LDL cholesterol (mg/dL) | ||
| - $A$: age (years) | ||
| - $B$: BMI (kg/m²) | ||
| - $T$: hormone therapy treatment assignment | ||
| ("placebo" or "hormone therapy") | ||
| - $H$: indicator variable for $T$ = "hormone therapy" | ||
| - $H = 0$ if $T$ = "placebo" | ||
| - $H = 1$ if $T$ = "hormone therapy" | ||
| - $U$: statin use ("no" or "yes") | ||
| - $V$: indicator variable for $U$ = "yes" | ||
| - $V = 0$ if $U$ = "no" | ||
| - $V = 1$ if $U$ = "yes" | ||
|
|
||
| ::: notes | ||
| "Placebo" is the **reference level** for the categorical variable $T$, | ||
| and "no" is the **reference level** for statin use $U$. | ||
| The choice of reference level is arbitrary; | ||
| it only affects the interpretation of the intercept and corresponding indicator coefficients. | ||
|
d-morrison marked this conversation as resolved.
|
||
|
|
||
| Since LDL is measured at **baseline** (before the hormone therapy was administered), | ||
| $H$ is not included as a predictor in our regression models for LDL. | ||
| We instead focus on statin use $U$ (and its indicator $V$) as the key grouping variable. | ||
| ::: | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,68 @@ | ||
| ### Diagnostics for `hers` parallel-planes model | ||
|
|
||
| #### Residuals vs fitted for `hers_lm1` | ||
|
|
||
| ```{r} | ||
| #| label: fig-hers-resid-fitted | ||
| #| fig-cap: "Residuals vs fitted values for `hers_lm1` (parallel planes model)" | ||
| library(ggplot2) | ||
| hers_diag <- hers |> | ||
| dplyr::mutate( | ||
| .fitted = fitted(hers_lm1), | ||
| .resid = residuals(hers_lm1) | ||
| ) | ||
|
|
||
| ggplot(hers_diag, aes(x = .fitted, y = .resid)) + | ||
| geom_point(alpha = 0.3) + | ||
| geom_hline(yintercept = 0, linetype = "dashed") + | ||
| facet_wrap(~statins, labeller = label_both) + | ||
| xlab("Fitted values") + | ||
| ylab("Residuals") + | ||
| theme_bw() | ||
| ``` | ||
|
|
||
| #### QQ plot for `hers_lm1` | ||
|
|
||
| ```{r} | ||
| #| label: fig-hers-qq | ||
| #| fig-cap: "QQ plot of residuals for `hers_lm1` (parallel planes model)" | ||
| ggplot(hers_diag, aes(sample = .resid)) + | ||
| stat_qq() + | ||
| stat_qq_line() + | ||
| facet_wrap(~statins, labeller = label_both) + | ||
| theme_bw() | ||
| ``` | ||
|
|
||
| ### Diagnostics for `hers` interaction model | ||
|
|
||
| #### Residuals vs fitted for `hers_lm2` | ||
|
|
||
| ```{r} | ||
| #| label: fig-hers-resid-fitted-lm2 | ||
| #| fig-cap: "Residuals vs fitted values for `hers_lm2` (interaction model)" | ||
| hers_diag2 <- hers |> | ||
| dplyr::mutate( | ||
| .fitted = fitted(hers_lm2), | ||
| .resid = residuals(hers_lm2) | ||
| ) | ||
|
|
||
| ggplot(hers_diag2, aes(x = .fitted, y = .resid)) + | ||
| geom_point(alpha = 0.3) + | ||
| geom_hline(yintercept = 0, linetype = "dashed") + | ||
| facet_wrap(~statins, labeller = label_both) + | ||
| xlab("Fitted values") + | ||
| ylab("Residuals") + | ||
| theme_bw() | ||
| ``` | ||
|
|
||
| #### QQ plot for `hers_lm2` | ||
|
|
||
| ```{r} | ||
| #| label: fig-hers-qq-lm2 | ||
| #| fig-cap: "QQ plot of residuals for `hers_lm2` (interaction model)" | ||
| ggplot(hers_diag2, aes(sample = .resid)) + | ||
| stat_qq() + | ||
| stat_qq_line() + | ||
| facet_wrap(~statins, labeller = label_both) + | ||
| theme_bw() | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| ### Goodness of fit for `hers` models | ||
|
|
||
| #### AIC and BIC for `hers` models | ||
|
|
||
| ```{r} | ||
| AIC(hers_lm1, hers_lm2) | ||
| BIC(hers_lm1, hers_lm2) | ||
| ``` | ||
|
|
||
| #### Deviance for `hers` models | ||
|
|
||
| ```{r} | ||
| deviance(hers_lm1) | ||
| deviance(hers_lm2) | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,123 @@ | ||
| ### Interactions in `hers` data {.smaller} | ||
|
|
||
| ::: notes | ||
| What if the slope of LDL with respect to BMI | ||
| differs depending on age? | ||
| Then we need an "interaction" between age $A$ and BMI $B$: | ||
| ::: | ||
|
|
||
| $$ | ||
| \ba | ||
| Y|A,B &\sciid N(\mu(A,B), \sigma^2)\\ | ||
| \mu(a,b) &= \beta_0 + \beta_A a + \beta_B b + \beta_{AB}(a \cdot b) | ||
| \ea | ||
| $$ {#eq-hers-interact} | ||
|
|
||
| ::: notes | ||
| Now the slope of mean LDL with respect to BMI $B$ | ||
| depends on age $A$: | ||
|
|
||
| $$ | ||
| \ba | ||
| \deriv{b}\mu(A=\red{0}, B=b) &= \beta_B + \beta_{AB} \cdot \red{0} = \beta_B \\ | ||
| \deriv{b}\mu(A=\red{a}, B=b) &= \beta_B + \beta_{AB} \red{a} | ||
| \ea | ||
| $$ | ||
|
|
||
| So the slope of LDL with respect to BMI | ||
| changes by $\beta_{AB}$ for each one-year increase in age. | ||
| ::: | ||
|
|
||
| ```{r} | ||
| #| label: tbl-hers-lm2 | ||
| #| tbl-cap: "HERS interaction model" | ||
| hers_lm2 <- lm( | ||
| LDL ~ age + BMI + age:BMI, | ||
| data = hers, | ||
| na.action = na.exclude | ||
| ) | ||
| hers_plot_data <- hers |> | ||
| dplyr::filter(!is.na(age), !is.na(BMI), !is.na(LDL)) | ||
|
|
||
| hers_lm2 |> | ||
| parameters::parameters() |> | ||
| parameters::print_md( | ||
| select = "{estimate}" | ||
| ) | ||
|
d-morrison marked this conversation as resolved.
|
||
| ``` | ||
|
|
||
| ::::: {.panel-tabset} | ||
|
|
||
| #### Statins: No | ||
|
|
||
| :::{#fig-hers-interact-fit-no} | ||
|
|
||
| ```{r} | ||
| #| code-fold: true | ||
| hers_no <- hers_plot_data |> dplyr::filter(statins == "no") | ||
|
|
||
| plotly::plot_ly( | ||
| x = hers_no[["age"]], | ||
| y = hers_no[["BMI"]], | ||
| z = hers_no[["LDL"]], | ||
| type = "scatter3d", | ||
| mode = "markers", | ||
| name = "No statins", | ||
| marker = list(size = 3, opacity = 0.3, color = "steelblue") | ||
| ) |> | ||
| regress3d::add_3d_surface( | ||
| model = hers_lm2, | ||
| data = hers_plot_data, | ||
| showlegend = TRUE | ||
| ) |> | ||
| plotly::layout( | ||
| scene = list( | ||
| xaxis = list(title = "Age (yr)"), | ||
| yaxis = list(title = "BMI (kg/m²)"), | ||
| zaxis = list(title = "LDL (mg/dL)") | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| Interaction model regression surface for `hers` data | ||
| (patients not taking statins) | ||
|
|
||
| ::: | ||
|
|
||
| #### Statins: Yes | ||
|
|
||
| :::{#fig-hers-interact-fit-yes} | ||
|
|
||
| ```{r} | ||
| #| code-fold: true | ||
| hers_yes <- hers_plot_data |> dplyr::filter(statins == "yes") | ||
|
|
||
| plotly::plot_ly( | ||
| x = hers_yes[["age"]], | ||
| y = hers_yes[["BMI"]], | ||
| z = hers_yes[["LDL"]], | ||
| type = "scatter3d", | ||
| mode = "markers", | ||
| name = "Yes statins", | ||
| marker = list(size = 3, opacity = 0.3, color = "darkorange") | ||
| ) |> | ||
| regress3d::add_3d_surface( | ||
| model = hers_lm2, | ||
| data = hers_plot_data, | ||
| showlegend = TRUE | ||
| ) |> | ||
| plotly::layout( | ||
| scene = list( | ||
| xaxis = list(title = "Age (yr)"), | ||
| yaxis = list(title = "BMI (kg/m²)"), | ||
| zaxis = list(title = "LDL (mg/dL)") | ||
| ) | ||
| ) | ||
| ``` | ||
|
|
||
| Interaction model regression surface for `hers` data | ||
| (patients taking statins) | ||
|
|
||
| ::: | ||
|
|
||
| ::::: | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| ### Model selection for `hers` data | ||
|
|
||
| #### Comparing HERS models using LRT | ||
|
|
||
| ```{r} | ||
| library(lmtest) | ||
| lrtest(hers_lm1, hers_lm2) | ||
| ``` |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.