-
Notifications
You must be signed in to change notification settings - Fork 14
Add formal linear-regression diagnostic test summaries (Fligner–Killeen, Levene/Brown–Forsythe, Shapiro–Wilk) #538
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
20 commits
Select commit
Hold shift + click to select a range
c474a72
Initial plan
Copilot ed796ce
Add summaries for Fligner, Levene, and Shapiro-Wilk diagnostics
Copilot 5711f15
Address review feedback for diagnostic test summaries
Copilot 5fb2107
Merge branch 'main' into copilot/add-summaries-of-diagnostic-tests
d-morrison 5b85b50
Compare formal tests with visual diagnostics
Copilot 92bfe0c
Clarify comparison with visual diagnostics
Copilot 69323b2
Add Brown-Forsythe test details from Kutner
Copilot 6a227bb
Standardize Shapiro-Wilk label punctuation
Copilot ef48cf3
Normalize Brown-Forsythe formula subscripts
Copilot 736fc7b
Define Brown-Forsythe formula terms explicitly
Copilot 54d25f2
Define z_ij notation for Brown-Forsythe test
Copilot 54d3575
Clarify Brown-Forsythe notation and robustness wording
Copilot f60ebbc
Merge branch 'main' into copilot/add-summaries-of-diagnostic-tests
d-morrison 0a37a6d
Cross-reference visual diagnostics figures in test summary
Copilot ac37cf7
Update _subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd
d-morrison 183613f
Merge branch 'main' into copilot/add-summaries-of-diagnostic-tests
d-morrison 0e5aaba
Apply suggestions from code review
d-morrison ccd7ade
Merge branch 'main' into copilot/add-summaries-of-diagnostic-tests
d-morrison 8e6eb71
Apply diagnostics review thread fixes
Copilot 14b240d
Update _subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd
d-morrison File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -1096,6 +1096,190 @@ All three plots show the same data and reference line. | |||||
|
|
||||||
| --- | ||||||
|
|
||||||
| ### Formal diagnostic tests for linear regression assumptions | ||||||
|
|
||||||
| Graphical diagnostics are usually the first step, | ||||||
| but formal tests can provide numerical summaries. | ||||||
|
|
||||||
| For linear regression residuals, | ||||||
| three common tests are: | ||||||
|
|
||||||
| - `fligner.test()` for equal variances across groups | ||||||
| (the Fligner--Killeen test). | ||||||
| - Levene / Brown--Forsythe test | ||||||
| (a median-centered Levene variant, | ||||||
| where standard Levene centers on group means, | ||||||
| and Brown--Forsythe centers on group medians for more robustness; | ||||||
| e.g., via `car::leveneTest(..., center = median)` or equivalent code). | ||||||
| - `shapiro.test()` / | ||||||
| [Shapiro--Wilk test](https://en.wikipedia.org/wiki/Shapiro%E2%80%93Wilk_test) | ||||||
| for normality. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| #### Fligner--Killeen test (homoskedasticity across groups) | ||||||
|
|
||||||
| Suppose residuals are split into groups | ||||||
| ($g = 1, \ldots, G$), | ||||||
| for example by a categorical predictor. | ||||||
|
|
||||||
| The test starts from absolute deviations from each group median: | ||||||
| $$ | ||||||
| d_{gi} = |e_{gi} - \text{median}(e_{g1}, \ldots, e_{g,n_g})|. | ||||||
| $$ | ||||||
|
|
||||||
| After ranking the pooled $d_{gi}$ values, | ||||||
| the Fligner--Killeen statistic is built from normal scores of those ranks. | ||||||
|
|
||||||
| Under the null hypothesis of equal variances, | ||||||
| the test statistic is approximately $\chi^2_{G-1}$. | ||||||
| Small p-values suggest heteroskedasticity. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| #### Levene / Brown--Forsythe test (homoskedasticity across groups) | ||||||
|
|
||||||
| Levene's test transforms residuals to within-group absolute deviations: | ||||||
| $$ | ||||||
| z_{gi} = |e_{gi} - c_g|, | ||||||
| $$ | ||||||
| where $c_g$ is the group center. | ||||||
|
|
||||||
| Classical Levene uses the group mean for $c_g$. | ||||||
| Brown--Forsythe uses the group median, | ||||||
| which is more robust. | ||||||
|
|
||||||
| Then run a one-way ANOVA on $z_{gi}$ by group: | ||||||
| $$ | ||||||
| F = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}} | ||||||
| \sim F_{G-1, N-G} | ||||||
| \quad\text{under }H_0. | ||||||
| $$ | ||||||
|
|
||||||
| Small p-values suggest unequal residual variance. | ||||||
|
|
||||||
| For simple linear regression, | ||||||
| @kutner2005applied [pp. 116--117] describes | ||||||
| the Brown--Forsythe test | ||||||
| by splitting observations into two $X$-level groups | ||||||
| (low versus high), | ||||||
| computing absolute deviations from each group median, | ||||||
| and applying a two-sample pooled-variance t test: | ||||||
| let | ||||||
| $$ | ||||||
| z_{ij} = |e_{ij} - \tilde e_i|, | ||||||
| $$ | ||||||
| where | ||||||
| $j$ indexes observations within group $i$, | ||||||
| and $\tilde e_i$ is the median residual in group $i$. | ||||||
| Then: | ||||||
| $$ | ||||||
| t_{\text{BF}} = | ||||||
| \frac{\bar z_{1} - \bar z_{2}} | ||||||
| {s_p \sqrt{1/n_{1} + 1/n_{2}}}, | ||||||
| \quad | ||||||
| t_{\text{BF}} \approx t_{n_{1}+n_{2}-2} | ||||||
| \text{ under }H_0. | ||||||
| $$ | ||||||
| Here, | ||||||
| $\bar z_{1}$ and $\bar z_{2}$ | ||||||
| are the means of the $z_{ij}$ values | ||||||
| in groups $i=1$ and $i=2$, | ||||||
| $s_p$ is their pooled standard deviation, | ||||||
| and $n_{1}, n_{2}$ are the two group sample sizes. | ||||||
| Large $|t_{\text{BF}}|$ | ||||||
| indicates nonconstant residual variance. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| #### Shapiro--Wilk test (normality of standardized residuals) | ||||||
|
|
||||||
| For ordered standardized residuals | ||||||
| $r_{(1)} \le \cdots \le r_{(n)}$, | ||||||
| the Shapiro--Wilk statistic is: | ||||||
| $$ | ||||||
| W = | ||||||
| \frac{\left(\sum_{i=1}^n a_i r_{(i)}\right)^2} | ||||||
| {\sum_{i=1}^n (r_i - \bar r)^2}, | ||||||
| $$ | ||||||
| where $a_i$ are constants from normal-order-statistic moments. | ||||||
| The numerator uses ordered residuals $r_{(i)}$, | ||||||
| while the denominator uses the original (unordered) residuals. | ||||||
|
|
||||||
| If residuals are Gaussian, | ||||||
| $W$ tends to be close to 1. | ||||||
| Small $W$ (and small p-value) | ||||||
| indicates departure from normality. | ||||||
|
|
||||||
| --- | ||||||
|
|
||||||
| #### Numerical example (`birthweight` interaction model) | ||||||
| broom::augment(bw_lm2, data = bw) |> | ||||||
| transmute( | ||||||
| sex, | ||||||
| resid_lm2 = .resid, | ||||||
| std_resid_lm2 = .std.resid | ||||||
| ) | ||||||
|
|
||||||
| fligner_bw <- fligner.test(resid_lm2 ~ sex, data = diag_bw) | ||||||
|
|
||||||
| levene_bw <- | ||||||
| diag_bw |> | ||||||
| group_by(sex) |> | ||||||
| mutate( | ||||||
| med_resid = median(resid_lm2), | ||||||
| abs_dev = abs(resid_lm2 - med_resid) | ||||||
| ) |> | ||||||
| ungroup() | ||||||
|
|
||||||
| levene_fit <- aov(abs_dev ~ sex, data = levene_bw) | ||||||
| levene_tab <- summary(levene_fit)[[1]] | ||||||
| levene_F <- unname(levene_tab[1, "F value"]) | ||||||
| levene_p <- unname(levene_tab[1, "Pr(>F)"]) | ||||||
|
|
||||||
| shapiro_bw <- shapiro.test(diag_bw$std_resid_lm2) | ||||||
|
|
||||||
| tibble( | ||||||
| test = c( | ||||||
| "Fligner--Killeen: equal variance by sex", | ||||||
| "Levene/Brown--Forsythe: equal variance by sex", | ||||||
| "Shapiro--Wilk: normality of standardized residuals" | ||||||
| ), | ||||||
| statistic = c( | ||||||
| unname(fligner_bw$statistic), | ||||||
| levene_F, | ||||||
| unname(shapiro_bw$statistic) | ||||||
| ), | ||||||
| p_value = c( | ||||||
| fligner_bw$p.value, | ||||||
| levene_p, | ||||||
| shapiro_bw$p.value | ||||||
| ) | ||||||
| ) |> | ||||||
| mutate( | ||||||
| statistic = signif(statistic, 4), | ||||||
| p_value = signif(p_value, 4) | ||||||
| ) | ||||||
| ``` | ||||||
|
|
||||||
| Interpretation rule: | ||||||
| for all three tests, | ||||||
| a small p-value is evidence against the corresponding model assumption. | ||||||
|
d-morrison marked this conversation as resolved.
|
||||||
|
|
||||||
| Compared with visual diagnostics: | ||||||
|
|
||||||
| - Fligner–Killeen / Levene summarizes the same heteroskedasticity signal | ||||||
|
||||||
| - Fligner–Killeen / Levene summarizes the same heteroskedasticity signal | |
| - Fligner–Killeen and Levene summarize the same heteroskedasticity signal |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section introduces residual notation as$e_{gi}$ / $e_{ij}$ , but elsewhere in the same file residuals are denoted with the project macros (e.g., $\resid_i$ and $\stdresid_i$ at around line 1285). To keep notation consistent and leverage the existing macros, consider switching these formulas to use $\resid_{gi}$ (and, if needed, $\stdresid$ ) instead of introducing a new symbol $e$ .