-
Notifications
You must be signed in to change notification settings - Fork 14
Expand independence-assumption diagnostics in linear regression with mathematical details and worked examples #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
d-morrison
merged 16 commits into
main
from
copilot/add-diagnostics-section-linear-regression
Apr 28, 2026
+243
−0
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
1b65542
Initial plan
Copilot d3b27d3
docs: add independence diagnostics section for linear regression
Copilot 67607dd
docs: add independence diagnostics math and worked examples
Copilot 5969608
docs: use subfiles for independence test math and examples
Copilot 360d4eb
docs: clarify AR error notation and dependent-example naming
Copilot 017b84a
docs: tighten independence diagnostics explanations
Copilot 3661078
docs: make correlogram par reset robust
Copilot e5083cd
docs: rename simulated AR error variable for clarity
Copilot b00d5d3
docs: preserve full graphics state in correlogram example
Copilot a6673ae
docs: simplify time index setup in independence example
Copilot f0b4667
Merge branch 'main' into copilot/add-diagnostics-section-linear-regre…
d-morrison 8143f6e
docs: add real-data independence diagnostics examples
Copilot f6b5e5f
docs: replace wikipedia primary citations with textbook sources
Copilot d98ff69
docs: apply remaining review-thread fixes for independence diagnostics
Copilot 26c428f
docs: use theorem-variant divs in independence diagnostics section
Copilot dde5b89
Merge branch 'main' into copilot/add-diagnostics-section-linear-regre…
d-morrison File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
138 changes: 138 additions & 0 deletions
138
_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,138 @@ | ||
| :::{#exm-independence-diagnostics-simulated} | ||
| #### Simulated example for independence diagnostics | ||
|
|
||
| The example below simulates ordered data with positively autocorrelated errors. | ||
| That creates a setting where independence should fail. | ||
| We generate the error term from an AR(1) process with coefficient 0.7, | ||
| matching the notation in the mathematical details section. | ||
|
|
||
| ```{r} | ||
| #| label: exm-independence-sim-data | ||
| #| code-fold: false | ||
| set.seed(204) | ||
|
|
||
| n_obs <- 120 | ||
| x <- seq(-1, 1, length.out = n_obs) | ||
| ar_errors <- as.numeric(arima.sim(model = list(ar = 0.7), n = n_obs, sd = 1)) | ||
| y <- 2 + 1.5 * x + ar_errors | ||
|
|
||
| serial_dep_exm <- tibble::tibble( | ||
| time_index = seq_len(n_obs), | ||
| x = x, | ||
| y = y | ||
| ) | ||
|
|
||
| serial_dep_exm_lm <- lm(y ~ x, data = serial_dep_exm) | ||
|
|
||
| summary(serial_dep_exm_lm) | ||
| ``` | ||
|
|
||
| Residuals versus order: | ||
|
|
||
| ```{r} | ||
| #| label: fig-independence-resid-order | ||
| #| fig-cap: "Residuals versus observation order for simulated data" | ||
| serial_dep_exm |> | ||
| dplyr::mutate(resid = resid(serial_dep_exm_lm)) |> | ||
| ggplot(aes(x = time_index, y = resid)) + | ||
| geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") + | ||
| geom_line(color = "steelblue") + | ||
| theme_classic() + | ||
| labs( | ||
| x = "Observation order", | ||
| y = "Residual" | ||
| ) | ||
| ``` | ||
|
|
||
| Correlogram diagnostics: | ||
|
|
||
| ```{r} | ||
| #| label: fig-independence-correlogram | ||
| #| fig-cap: "Residual ACF and PACF for simulated data" | ||
| local({ | ||
| op <- par(no.readonly = TRUE) | ||
| on.exit(par(op), add = TRUE) | ||
| par(mfrow = c(1, 2)) | ||
| acf( | ||
| resid(serial_dep_exm_lm), | ||
| main = "Residual ACF" | ||
| ) | ||
| pacf( | ||
| resid(serial_dep_exm_lm), | ||
| main = "Residual PACF" | ||
| ) | ||
| }) | ||
| ``` | ||
|
|
||
| Durbin-Watson and Breusch-Godfrey tests: | ||
|
|
||
| ```{r} | ||
| #| label: exm-independence-formal-tests | ||
| #| code-fold: false | ||
| lmtest::dwtest(serial_dep_exm_lm, alternative = "two.sided") | ||
| lmtest::bgtest(serial_dep_exm_lm, order = 4) | ||
| ``` | ||
|
|
||
| In this example, | ||
| the residual-order plot and correlogram suggest serial dependence, | ||
| and both formal tests reject independence. | ||
| ::: | ||
|
|
||
| :::{#exm-independence-diagnostics-real-data} | ||
| #### Real-data examples for independence diagnostics | ||
|
|
||
| We can also run the same diagnostics on real ordered data sets. | ||
|
|
||
| ##### Nile annual river flow | ||
|
|
||
| ```{r} | ||
| #| label: exm-independence-real-nile | ||
| #| code-fold: false | ||
| nile_df <- tibble::tibble( | ||
| year = as.numeric(time(datasets::Nile)), | ||
| flow = as.numeric(datasets::Nile) | ||
| ) | ||
|
|
||
| nile_lm <- lm(flow ~ year, data = nile_df) | ||
|
|
||
| summary(nile_lm) | ||
| lmtest::dwtest(nile_lm, alternative = "two.sided") | ||
| lmtest::bgtest(nile_lm, order = 4) | ||
| ``` | ||
|
|
||
| ```{r} | ||
| #| label: fig-independence-real-nile-resid-order | ||
| #| fig-cap: "Residuals versus year for Nile flow model" | ||
| nile_df |> | ||
| dplyr::mutate(resid = resid(nile_lm)) |> | ||
| ggplot(aes(x = year, y = resid)) + | ||
| geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") + | ||
| geom_line(color = "steelblue") + | ||
| theme_classic() + | ||
| labs( | ||
| x = "Year", | ||
| y = "Residual" | ||
| ) | ||
| ``` | ||
|
|
||
| ##### Mauna Loa atmospheric carbon dioxide (`co2`) | ||
|
|
||
| ```{r} | ||
| #| label: exm-independence-real-co2 | ||
| #| code-fold: false | ||
| co2_df <- tibble::tibble( | ||
| decimal_year = as.numeric(time(datasets::co2)), | ||
| co2_ppm = as.numeric(datasets::co2) | ||
| ) | ||
|
|
||
| co2_lm <- lm(co2_ppm ~ decimal_year, data = co2_df) | ||
|
|
||
| summary(co2_lm) | ||
| lmtest::dwtest(co2_lm, alternative = "two.sided") | ||
| lmtest::bgtest(co2_lm, order = 12) | ||
| ``` | ||
|
|
||
| For both real-data examples, | ||
| the diagnostics indicate residual dependence, | ||
| consistent with their time-ordered structure. | ||
| ::: |
59 changes: 59 additions & 0 deletions
59
_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,59 @@ | ||
| #### Mathematical details for Durbin-Watson and Breusch-Godfrey | ||
|
|
||
| Suppose the observations have a meaningful order, | ||
| indexed by $t = 1, \ldots, n$. | ||
| Let $e_t$ denote the OLS residual from a fitted regression model. | ||
|
|
||
| :::{#def-durbin-watson-diagnostic} | ||
| #### Durbin-Watson diagnostic | ||
|
|
||
| The test statistic is | ||
| $$ | ||
| \ba | ||
| d | ||
| &= \frac{\sum_{t = 2}^{n} (e_t - e_{t-1})^2} | ||
| {\sum_{t = 1}^{n} e_t^2} | ||
| \ea | ||
| $$ | ||
| which is approximately | ||
| $2(1 - \hat r_1)$, | ||
| where $\hat r_1$ is the sample lag-1 residual autocorrelation | ||
| [@kutner2005applied, Chapter 12; @chatterjee2015regression, Chapter 6]. | ||
| This approximation is most useful in standard large-sample linear-model settings, | ||
| and is less straightforward in models with lagged outcomes | ||
| [@kutner2005applied, Chapter 12]. | ||
|
|
||
| The null and alternatives are typically framed through | ||
| an AR(1) error model: | ||
| $$ | ||
| \eps_t = \rho \eps_{t-1} + u_t. | ||
| $$ | ||
| Here, | ||
| $\eps_t$ is the autocorrelated regression error process, | ||
| and $u_t$ is a white-noise innovation term. | ||
| The null is $H_0: \rho = 0$, | ||
| and alternatives can be one-sided or two-sided, | ||
| depending on whether we suspect positive, | ||
| negative, | ||
| or any serial correlation | ||
| [@kutner2005applied, Chapter 12]. | ||
| ::: | ||
|
|
||
| :::{#def-breusch-godfrey-diagnostic} | ||
| #### Breusch-Godfrey diagnostic | ||
|
|
||
| For a test of order $p$, | ||
| we run an auxiliary regression of residuals on: | ||
| the original regressors, | ||
| and lagged residuals $e_{t-1}, \ldots, e_{t-p}$. | ||
| If $R^2_{\text{aux}}$ is from that auxiliary model, | ||
| the LM statistic is | ||
| $$ | ||
| \text{LM} = (n - p)R^2_{\text{aux}}, | ||
| $$ | ||
| and under | ||
| $H_0: \rho_1 = \cdots = \rho_p = 0$, | ||
| it is asymptotically | ||
| $\chi^2_p$ | ||
| [@chatterjee2015regression, Chapter 6; @draper2014applied, Chapter 11]. | ||
| ::: |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.