Skip to content
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
c474a72
Initial plan
Copilot Apr 16, 2026
ed796ce
Add summaries for Fligner, Levene, and Shapiro-Wilk diagnostics
Copilot Apr 16, 2026
5711f15
Address review feedback for diagnostic test summaries
Copilot Apr 16, 2026
5fb2107
Merge branch 'main' into copilot/add-summaries-of-diagnostic-tests
d-morrison Apr 16, 2026
5b85b50
Compare formal tests with visual diagnostics
Copilot Apr 16, 2026
92bfe0c
Clarify comparison with visual diagnostics
Copilot Apr 16, 2026
69323b2
Add Brown-Forsythe test details from Kutner
Copilot Apr 16, 2026
6a227bb
Standardize Shapiro-Wilk label punctuation
Copilot Apr 16, 2026
ef48cf3
Normalize Brown-Forsythe formula subscripts
Copilot Apr 16, 2026
736fc7b
Define Brown-Forsythe formula terms explicitly
Copilot Apr 16, 2026
54d25f2
Define z_ij notation for Brown-Forsythe test
Copilot Apr 16, 2026
54d3575
Clarify Brown-Forsythe notation and robustness wording
Copilot Apr 16, 2026
f60ebbc
Merge branch 'main' into copilot/add-summaries-of-diagnostic-tests
d-morrison Apr 16, 2026
0a37a6d
Cross-reference visual diagnostics figures in test summary
Copilot Apr 16, 2026
ac37cf7
Update _subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd
d-morrison Apr 20, 2026
183613f
Merge branch 'main' into copilot/add-summaries-of-diagnostic-tests
d-morrison Apr 20, 2026
0e5aaba
Apply suggestions from code review
d-morrison Apr 28, 2026
ccd7ade
Merge branch 'main' into copilot/add-summaries-of-diagnostic-tests
d-morrison Apr 28, 2026
8e6eb71
Apply diagnostics review thread fixes
Copilot Apr 28, 2026
14b240d
Update _subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd
d-morrison Apr 28, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
184 changes: 184 additions & 0 deletions _subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -1096,6 +1096,190 @@ All three plots show the same data and reference line.

---

### Formal diagnostic tests for linear regression assumptions

Graphical diagnostics are usually the first step,
but formal tests can provide numerical summaries.

For linear regression residuals,
three common tests are:

- `fligner.test()` for equal variances across groups
(the Fligner--Killeen test).
- Levene / Brown--Forsythe test
(a median-centered Levene variant,
where standard Levene centers on group means,
and Brown--Forsythe centers on group medians for more robustness;
e.g., via `car::leveneTest(..., center = median)` or equivalent code).
- `shapiro.test()` /
[Shapiro--Wilk test](https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test)
for normality.

---

#### Fligner--Killeen test (homoskedasticity across groups)

Suppose residuals are split into groups
($g = 1, \ldots, G$),
for example by a categorical predictor.

The test starts from absolute deviations from each group median:
$$
d_{gi} = |e_{gi} - \text{median}(e_{g1}, \ldots, e_{g,n_g})|.
$$

After ranking the pooled $d_{gi}$ values,
the Fligner--Killeen statistic is built from normal scores of those ranks.

Under the null hypothesis of equal variances,
the test statistic is approximately $\chi^2_{G-1}$.
Small p-values suggest heteroskedasticity.

---

#### Levene / Brown--Forsythe test (homoskedasticity across groups)

Levene's test transforms residuals to within-group absolute deviations:
$$
z_{gi} = |e_{gi} - c_g|,
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section introduces residual notation as $e_{gi}$ / $e_{ij}$, but elsewhere in the same file residuals are denoted with the project macros (e.g., $\resid_i$ and $\stdresid_i$ at around line 1285). To keep notation consistent and leverage the existing macros, consider switching these formulas to use $\resid_{gi}$ (and, if needed, $\stdresid$) instead of introducing a new symbol $e$.

Suggested change
z_{gi} = |e_{gi} - c_g|,
z_{gi} = |\resid_{gi} - c_g|,

Copilot uses AI. Check for mistakes.
$$
where $c_g$ is the group center.

Classical Levene uses the group mean for $c_g$.
Brown--Forsythe uses the group median,
which is more robust.

Then run a one-way ANOVA on $z_{gi}$ by group:
$$
F = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}}
\sim F_{G-1, N-G}
\quad\text{under }H_0.
$$

Small p-values suggest unequal residual variance.

For simple linear regression,
@kutner2005applied [pp. 116--117] describes
the Brown--Forsythe test
by splitting observations into two $X$-level groups
(low versus high),
computing absolute deviations from each group median,
and applying a two-sample pooled-variance t test:
let
$$
z_{ij} = |e_{ij} - \tilde e_i|,
$$
where
$j$ indexes observations within group $i$,
and $\tilde e_i$ is the median residual in group $i$.
Then:
$$
t_{\text{BF}} =
\frac{\bar z_{1} - \bar z_{2}}
{s_p \sqrt{1/n_{1} + 1/n_{2}}},
\quad
t_{\text{BF}} \approx t_{n_{1}+n_{2}-2}
\text{ under }H_0.
$$
Here,
$\bar z_{1}$ and $\bar z_{2}$
are the means of the $z_{ij}$ values
in groups $i=1$ and $i=2$,
$s_p$ is their pooled standard deviation,
and $n_{1}, n_{2}$ are the two group sample sizes.
Large $|t_{\text{BF}}|$
indicates nonconstant residual variance.

---

#### Shapiro--Wilk test (normality of standardized residuals)

For ordered standardized residuals
$r_{(1)} \le \cdots \le r_{(n)}$,
the Shapiro--Wilk statistic is:
$$
W =
\frac{\left(\sum_{i=1}^n a_i r_{(i)}\right)^2}
{\sum_{i=1}^n (r_i - \bar r)^2},
$$
where $a_i$ are constants from normal-order-statistic moments.
The numerator uses ordered residuals $r_{(i)}$,
while the denominator uses the original (unordered) residuals.

If residuals are Gaussian,
$W$ tends to be close to 1.
Small $W$ (and small p-value)
indicates departure from normality.

---

#### Numerical example (`birthweight` interaction model)
broom::augment(bw_lm2, data = bw) |>
transmute(
sex,
resid_lm2 = .resid,
std_resid_lm2 = .std.resid
)

fligner_bw <- fligner.test(resid_lm2 ~ sex, data = diag_bw)

levene_bw <-
diag_bw |>
group_by(sex) |>
mutate(
med_resid = median(resid_lm2),
abs_dev = abs(resid_lm2 - med_resid)
) |>
ungroup()

levene_fit <- aov(abs_dev ~ sex, data = levene_bw)
levene_tab <- summary(levene_fit)[[1]]
levene_F <- unname(levene_tab[1, "F value"])
levene_p <- unname(levene_tab[1, "Pr(>F)"])

shapiro_bw <- shapiro.test(diag_bw$std_resid_lm2)

tibble(
test = c(
"Fligner--Killeen: equal variance by sex",
"Levene/Brown--Forsythe: equal variance by sex",
"Shapiro--Wilk: normality of standardized residuals"
),
statistic = c(
unname(fligner_bw$statistic),
levene_F,
unname(shapiro_bw$statistic)
),
p_value = c(
fligner_bw$p.value,
levene_p,
shapiro_bw$p.value
)
) |>
mutate(
statistic = signif(statistic, 4),
p_value = signif(p_value, 4)
)
```

Interpretation rule:
for all three tests,
a small p-value is evidence against the corresponding model assumption.
Comment thread
d-morrison marked this conversation as resolved.

Compared with visual diagnostics:

- Fligner–Killeen / Levene summarizes the same heteroskedasticity signal
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

“Fligner–Killeen / Levene summarizes …” is grammatically inconsistent because it refers to two tests. Consider changing to “Fligner–Killeen and Levene summarize …” (plural) or rephrasing to a singular subject (e.g., “A Fligner–Killeen/Levene test summarizes …”).

Suggested change
- Fligner–Killeen / Levene summarizes the same heteroskedasticity signal
- Fligner–Killeen and Levene summarize the same heteroskedasticity signal

Copilot uses AI. Check for mistakes.
that we inspect in residuals-vs-fitted (@fig-bw_lm2-resid-vs-fitted)
and scale-location (@fig-bw-scale-loc) plots.
- Shapiro–Wilk summarizes the same normality signal
that we inspect in QQ plots (@fig-qqplot-autoplot)
and standardized-residual histograms (@fig-marg-stresd).
- Use tests and plots together:
the tests provide a single numerical summary,
while the plots show the shape and practical size of departures.

---

### Conditional distributions of residuals

If our Gaussian linear regression model is correct, the residuals $\resid_i$ and standardized residuals $\stdresid_i$ should have:
Expand Down
Loading