Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
557fac2
Initial plan
Copilot Apr 7, 2026
6303987
Add HERS dataset linear regression example following Vittinghoff Chap…
Copilot Apr 7, 2026
a841ba7
Pre-compute coefficient values in named variables for maintainability
Copilot Apr 7, 2026
e828c2f
Intersperse HERS analysis with birthweight sections; add HERS interac…
Copilot Apr 7, 2026
27418f9
Use explicit data= in geom_line instead of ggplot data replacement op…
Copilot Apr 7, 2026
ded1bb6
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 8, 2026
752d970
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 8, 2026
71fa75e
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 9, 2026
f7d07a6
Replace BMI×HT interaction with BMI×statins interaction in HERS model
Copilot Apr 9, 2026
90ae68a
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 10, 2026
4ae72cf
Resolve merge conflicts with main: move HERS includes to _sec_linreg_…
Copilot Apr 10, 2026
3575c3e
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 14, 2026
8303639
Address PR review: delete unreferenced standalone HERS file and fix W…
Copilot Apr 14, 2026
0bde169
Merge remote-tracking branch 'origin/main' into copilot/add-example-m…
Copilot Apr 16, 2026
c46e981
Restructure HERS linear regression examples
Copilot Apr 16, 2026
f9d6ec1
Remove duplicate lrtest from _sec_hers_lm_gof.qmd
Copilot Apr 16, 2026
a0e362d
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 16, 2026
c8913d8
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 16, 2026
f61a83b
Fix PR review issues: include path, factor levels, diagnostics, key-v…
Copilot Apr 16, 2026
9d7eb27
Fix ggpairs alpha placement and fig-cap quote formatting
Copilot Apr 16, 2026
e78dc81
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 19, 2026
1811f6a
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 21, 2026
7f8014a
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 21, 2026
238201b
Add GGally to DESCRIPTION Suggests and renv.lock
Copilot Apr 21, 2026
626ab42
Merge main into branch: resolve WORDLIST and renv.lock conflicts
Copilot Apr 28, 2026
b5481a9
Merge main into branch: resolve WORDLIST conflict
Copilot Apr 28, 2026
356b524
Fix GGally::wrap conflict with pander::wrap in fig-hers-key-vars chunk
Copilot Apr 28, 2026
e0946ed
Remove HT from HERS models; facet by statins; add stratified regressi…
Copilot Apr 28, 2026
d00f257
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 28, 2026
ab62c70
feat: replace HERS scatter with 3D plotly plot and add regress3d surf…
Copilot Apr 28, 2026
848d6d5
fix: address code review - add derivative comparison lines and clarif…
Copilot Apr 28, 2026
105089e
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 28, 2026
148e28f
fix: restore renv.lock from 626ab42 base (297→298 packages) + add reg…
Copilot Apr 28, 2026
c1a964f
Remove multi-column divs from 3D figures; stratify by statin use usin…
Copilot Apr 28, 2026
2a986f2
Merge branch 'main' into copilot/add-example-model-hers-dataset
d-morrison Apr 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,8 @@ Suggests:
sjPlot,
equatiomatic,
broom (>= 1.0.8),
GGally,
regress3d,
lmtest,
gh,
lintr,
Expand Down
139 changes: 139 additions & 0 deletions _sec_hers_data.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
### Motivating example: `hers` data {.smaller}

:::{.callout-note}
This section is based on @vittinghoff2e, Chapter 4.
Comment thread
d-morrison marked this conversation as resolved.
:::

::: notes

{{< include _subfiles/shared/_sec_hers_intro.qmd >}}

:::

```{r}
#| eval: false
#| code-fold: show
library(haven)
hers <- haven::read_dta(
paste0(
"https://regression.ucsf.edu/sites/g/files",
"/tkssra6706/f/wysiwyg/home/data/hersdata.dta"
)
)
```

```{r}
#| include: false
library(haven)
hers <-
fs::path_package("rme", "extdata/hersdata.dta") |>
read_dta() |>
dplyr::mutate(
HT = as_factor(HT) |>
relevel(ref = "placebo"),
statins = as_factor(statins) |>
relevel(ref = "no")
)
```

::::: {.panel-tabset}

#### Data as table

```{r}
#| label: tbl-hers-ch4
#| tbl-cap: "`hers` data"
hers |> head()
```

#### Data as graph

```{r}
#| label: fig-hers-scatter
#| fig-cap: >
#| `hers` data (@vittinghoff2e):
#| age (years) and BMI (kg/m²) vs. baseline LDL (mg/dL),
#| colored by statin use.
library(plotly)
hers_scatter_data <- hers |>
dplyr::filter(!is.na(age), !is.na(BMI), !is.na(LDL))
plotly::plot_ly(
x = hers_scatter_data[["age"]],
y = hers_scatter_data[["BMI"]],
z = hers_scatter_data[["LDL"]],
color = as.character(hers_scatter_data[["statins"]]),
colors = c("no" = "steelblue", "yes" = "darkorange"),
type = "scatter3d",
mode = "markers",
marker = list(size = 3, opacity = 0.5)
) |>
plotly::layout(
scene = list(
xaxis = list(title = "Age (yr)"),
yaxis = list(title = "BMI (kg/m²)"),
zaxis = list(title = "LDL (mg/dL)")
),
legend = list(title = list(text = "Statins"))
)
```

#### Key variables

```{r}
#| label: fig-hers-key-vars
#| fig-cap: >
#| Key variables in hers data: outcome (LDL),
#| treatment (HT), and covariates (BMI, statins, age)
#| fig-height: 7
#| fig-width: 8
library(GGally)
hers |>
dplyr::select(LDL, HT, BMI, statins, age) |>
ggpairs(
mapping = aes(col = statins),
lower = list(continuous = GGally::wrap("points", alpha = 0.3)),
columnLabels = c(
"LDL (mg/dL)",
"HT",
"BMI (kg/m²)",
"Statins",
"Age (yr)"
)
) +
theme_bw() +
theme(legend.position = "bottom")
```

:::::

---

#### Data notation {.smaller}

::: notes
Let's define some notation to represent this data:
:::

- $Y$: LDL cholesterol (mg/dL)
- $A$: age (years)
- $B$: BMI (kg/m²)
- $T$: hormone therapy treatment assignment
("placebo" or "hormone therapy")
- $H$: indicator variable for $T$ = "hormone therapy"
- $H = 0$ if $T$ = "placebo"
- $H = 1$ if $T$ = "hormone therapy"
- $U$: statin use ("no" or "yes")
- $V$: indicator variable for $U$ = "yes"
- $V = 0$ if $U$ = "no"
- $V = 1$ if $U$ = "yes"

::: notes
"Placebo" is the **reference level** for the categorical variable $T$,
and "no" is the **reference level** for statin use $U$.
The choice of reference level is arbitrary;
it only affects the interpretation of the intercept and corresponding indicator coefficients.
Comment thread
d-morrison marked this conversation as resolved.

Since LDL is measured at **baseline** (before the hormone therapy was administered),
$H$ is not included as a predictor in our regression models for LDL.
We instead focus on statin use $U$ (and its indicator $V$) as the key grouping variable.
:::
68 changes: 68 additions & 0 deletions _sec_hers_lm_diagnostics.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
### Diagnostics for `hers` parallel-planes model

#### Residuals vs fitted for `hers_lm1`

```{r}
#| label: fig-hers-resid-fitted
#| fig-cap: "Residuals vs fitted values for `hers_lm1` (parallel planes model)"
library(ggplot2)
hers_diag <- hers |>
dplyr::mutate(
.fitted = fitted(hers_lm1),
.resid = residuals(hers_lm1)
)

ggplot(hers_diag, aes(x = .fitted, y = .resid)) +
geom_point(alpha = 0.3) +
geom_hline(yintercept = 0, linetype = "dashed") +
facet_wrap(~statins, labeller = label_both) +
xlab("Fitted values") +
ylab("Residuals") +
theme_bw()
```

#### QQ plot for `hers_lm1`

```{r}
#| label: fig-hers-qq
#| fig-cap: "QQ plot of residuals for `hers_lm1` (parallel planes model)"
ggplot(hers_diag, aes(sample = .resid)) +
stat_qq() +
stat_qq_line() +
facet_wrap(~statins, labeller = label_both) +
theme_bw()
```

### Diagnostics for `hers` interaction model

#### Residuals vs fitted for `hers_lm2`

```{r}
#| label: fig-hers-resid-fitted-lm2
#| fig-cap: "Residuals vs fitted values for `hers_lm2` (interaction model)"
hers_diag2 <- hers |>
dplyr::mutate(
.fitted = fitted(hers_lm2),
.resid = residuals(hers_lm2)
)

ggplot(hers_diag2, aes(x = .fitted, y = .resid)) +
geom_point(alpha = 0.3) +
geom_hline(yintercept = 0, linetype = "dashed") +
facet_wrap(~statins, labeller = label_both) +
xlab("Fitted values") +
ylab("Residuals") +
theme_bw()
```

#### QQ plot for `hers_lm2`

```{r}
#| label: fig-hers-qq-lm2
#| fig-cap: "QQ plot of residuals for `hers_lm2` (interaction model)"
ggplot(hers_diag2, aes(sample = .resid)) +
stat_qq() +
stat_qq_line() +
facet_wrap(~statins, labeller = label_both) +
theme_bw()
```
15 changes: 15 additions & 0 deletions _sec_hers_lm_gof.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
### Goodness of fit for `hers` models

#### AIC and BIC for `hers` models

```{r}
AIC(hers_lm1, hers_lm2)
BIC(hers_lm1, hers_lm2)
```

#### Deviance for `hers` models

```{r}
deviance(hers_lm1)
deviance(hers_lm2)
```
123 changes: 123 additions & 0 deletions _sec_hers_lm_interact.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
### Interactions in `hers` data {.smaller}

::: notes
What if the slope of LDL with respect to BMI
differs depending on age?
Then we need an "interaction" between age $A$ and BMI $B$:
:::

$$
\ba
Y|A,B &\sciid N(\mu(A,B), \sigma^2)\\
\mu(a,b) &= \beta_0 + \beta_A a + \beta_B b + \beta_{AB}(a \cdot b)
\ea
$$ {#eq-hers-interact}

::: notes
Now the slope of mean LDL with respect to BMI $B$
depends on age $A$:

$$
\ba
\deriv{b}\mu(A=\red{0}, B=b) &= \beta_B + \beta_{AB} \cdot \red{0} = \beta_B \\
\deriv{b}\mu(A=\red{a}, B=b) &= \beta_B + \beta_{AB} \red{a}
\ea
$$

So the slope of LDL with respect to BMI
changes by $\beta_{AB}$ for each one-year increase in age.
:::

```{r}
#| label: tbl-hers-lm2
#| tbl-cap: "HERS interaction model"
hers_lm2 <- lm(
LDL ~ age + BMI + age:BMI,
data = hers,
na.action = na.exclude
)
hers_plot_data <- hers |>
dplyr::filter(!is.na(age), !is.na(BMI), !is.na(LDL))

hers_lm2 |>
parameters::parameters() |>
parameters::print_md(
select = "{estimate}"
)
Comment thread
d-morrison marked this conversation as resolved.
```

::::: {.panel-tabset}

#### Statins: No

:::{#fig-hers-interact-fit-no}

```{r}
#| code-fold: true
hers_no <- hers_plot_data |> dplyr::filter(statins == "no")

plotly::plot_ly(
x = hers_no[["age"]],
y = hers_no[["BMI"]],
z = hers_no[["LDL"]],
type = "scatter3d",
mode = "markers",
name = "No statins",
marker = list(size = 3, opacity = 0.3, color = "steelblue")
) |>
regress3d::add_3d_surface(
model = hers_lm2,
data = hers_plot_data,
showlegend = TRUE
) |>
plotly::layout(
scene = list(
xaxis = list(title = "Age (yr)"),
yaxis = list(title = "BMI (kg/m²)"),
zaxis = list(title = "LDL (mg/dL)")
)
)
```

Interaction model regression surface for `hers` data
(patients not taking statins)

:::

#### Statins: Yes

:::{#fig-hers-interact-fit-yes}

```{r}
#| code-fold: true
hers_yes <- hers_plot_data |> dplyr::filter(statins == "yes")

plotly::plot_ly(
x = hers_yes[["age"]],
y = hers_yes[["BMI"]],
z = hers_yes[["LDL"]],
type = "scatter3d",
mode = "markers",
name = "Yes statins",
marker = list(size = 3, opacity = 0.3, color = "darkorange")
) |>
regress3d::add_3d_surface(
model = hers_lm2,
data = hers_plot_data,
showlegend = TRUE
) |>
plotly::layout(
scene = list(
xaxis = list(title = "Age (yr)"),
yaxis = list(title = "BMI (kg/m²)"),
zaxis = list(title = "LDL (mg/dL)")
)
)
```

Interaction model regression surface for `hers` data
(patients taking statins)

:::

:::::
8 changes: 8 additions & 0 deletions _sec_hers_lm_model_selection.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
### Model selection for `hers` data

#### Comparing HERS models using LRT

```{r}
library(lmtest)
lrtest(hers_lm1, hers_lm2)
```
Loading
Loading