From 1b65542fa228cf636d4ea510f6e16f9640a57f1c Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 00:48:08 +0000 Subject: [PATCH 01/14] Initial plan From d3b27d3d441ed9384e75bce61fbf602afab09e84 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 00:57:44 +0000 Subject: [PATCH 02/14] docs: add independence diagnostics section for linear regression Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/77f22306-4128-4852-8346-93455180c8bb Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_diagnostics.qmd | 37 +++++++++++++++++++ references.bib | 32 ++++++++++++++++ 2 files changed, 69 insertions(+) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd index 3543712f0..26672f0b0 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd @@ -8,6 +8,43 @@ This section is adapted from @dobson4e [§6.2-6.3] and {{< include _subfiles/Linear-models-overview/_sec-linreg-assumptions.qmd >}} +### Diagnostics for the independence assumption + +The independence assumption means that residual noise terms are not correlated across observations, +after conditioning on the predictors in the model. + +For data with a natural ordering +(for example, time, spatial location, or clinic visit sequence), +we usually assess independence with both plots and formal tests +[@kutner2005applied, Chapter 12; @chatterjee2015regression, Chapter 6; @wp:regvalidation]. + +Common diagnostics include: + +1. Residuals versus observation order (or time) plots: +patterns, runs, or drifts can indicate dependence +[@kutner2005applied, Chapter 12]. + +2. Correlograms: +plot the sample autocorrelation function (ACF), +and often the partial autocorrelation function (PACF), +to look for serial structure +[@wp:correlogram; @chatterjee2015regression, Chapter 6]. + +3. Durbin-Watson test: +tests for first-order serial correlation in regression residuals +[@wp:durbinwatson]. + +4. Breusch-Godfrey test: +tests for higher-order serial correlation, +and is more flexible than Durbin-Watson in many regression settings +[@wp:breuschgodfrey]. + +No single diagnostic is definitive. +In practice, +we combine visual and formal diagnostics, +and interpret them in the context of study design +[@kutner2005applied, Chapter 12; @draper2014applied, Chapter 11]. + ### Direct visualization ::: notes diff --git a/references.bib b/references.bib index 3bb731149..bf67fa998 100644 --- a/references.bib +++ b/references.bib @@ -1365,6 +1365,38 @@ @misc{wp:statinference note = "[Online; accessed 7-September-2025]" } + @misc{wp:regvalidation, + author = "{Wikipedia contributors}", + title = "Regression validation --- {Wikipedia}{,} The Free Encyclopedia", + year = "2026", + url = "https://en.wikipedia.org/wiki/Regression_validation", + note = "[Online; accessed 18-April-2026]" + } + + @misc{wp:durbinwatson, + author = "{Wikipedia contributors}", + title = "Durbin--Watson statistic --- {Wikipedia}{,} The Free Encyclopedia", + year = "2026", + url = "https://en.wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic", + note = "[Online; accessed 18-April-2026]" + } + + @misc{wp:correlogram, + author = "{Wikipedia contributors}", + title = "Correlogram --- {Wikipedia}{,} The Free Encyclopedia", + year = "2026", + url = "https://en.wikipedia.org/wiki/Correlogram", + note = "[Online; accessed 18-April-2026]" + } + + @misc{wp:breuschgodfrey, + author = "{Wikipedia contributors}", + title = "Breusch--Godfrey test --- {Wikipedia}{,} The Free Encyclopedia", + year = "2026", + url = "https://en.wikipedia.org/wiki/Breusch%E2%80%93Godfrey_test", + note = "[Online; accessed 18-April-2026]" + } + @article{heinze2018variable, title={Variable selection -- {A} review and recommendations for the practicing statistician}, author={Heinze, Georg and Wallisch, Christine and Dunkler, Daniela}, From 67607dd12a431413e564c8eecb1d508feff4fe44 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 02:58:41 +0000 Subject: [PATCH 03/14] docs: add independence diagnostics math and worked examples Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/6bcdf40e-f97e-4fde-a0d3-49f15177becd Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_diagnostics.qmd | 4 + .../_sec_linreg_independence_examples.qmd | 76 +++++++++++++++++++ .../_sec_linreg_independence_math.qmd | 47 ++++++++++++ 3 files changed, 127 insertions(+) create mode 100644 _subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd create mode 100644 _subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd index 26672f0b0..fafbc5b22 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd @@ -39,6 +39,10 @@ tests for higher-order serial correlation, and is more flexible than Durbin-Watson in many regression settings [@wp:breuschgodfrey]. +{{< include _subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd >}} + +{{< include _subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd >}} + No single diagnostic is definitive. In practice, we combine visual and formal diagnostics, diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd new file mode 100644 index 000000000..285705754 --- /dev/null +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd @@ -0,0 +1,76 @@ +#### Example diagnostics with code and output + +The example below simulates ordered data with positively autocorrelated errors. +That creates a setting where independence should fail. + +```{r} +#| label: exm-independence-sim-data +#| code-fold: false +set.seed(204) + +n_obs <- 120 +time_index <- seq_len(n_obs) +x <- seq(-1, 1, length.out = n_obs) +err <- as.numeric(arima.sim(model = list(ar = 0.7), n = n_obs, sd = 1)) +y <- 2 + 1.5 * x + err + +indep_exm <- tibble::tibble( + time_index = time_index, + x = x, + y = y +) + +indep_exm_lm <- lm(y ~ x, data = indep_exm) + +summary(indep_exm_lm) +``` + +Residuals versus order: + +```{r} +#| label: fig-independence-resid-order +#| fig-cap: "Residuals versus observation order for simulated data" +indep_exm |> + dplyr::mutate(resid = resid(indep_exm_lm)) |> + ggplot(aes(x = time_index, y = resid)) + + geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") + + geom_line(color = "steelblue") + + theme_classic() + + labs( + x = "Observation order", + y = "Residual" + ) +``` + +Correlogram diagnostics: + +```{r} +#| label: fig-independence-correlogram +#| fig-cap: "Residual ACF and PACF for simulated data" +op <- par(mfrow = c(1, 2)) +acf( + resid(indep_exm_lm), + main = "Residual ACF" +) +pacf( + resid(indep_exm_lm), + main = "Residual PACF" +) +par(op) +``` + +Durbin-Watson and Breusch-Godfrey tests: + +```{r} +#| label: exm-independence-formal-tests +#| code-fold: false +library(lmtest) + +dwtest(indep_exm_lm, alternative = "two.sided") +bgtest(indep_exm_lm, order = 4) +``` + +In this example, +the residual-order plot and correlogram suggest serial dependence, +and both formal tests typically reject independence. + diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd new file mode 100644 index 000000000..ebb2c0cd7 --- /dev/null +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd @@ -0,0 +1,47 @@ +#### Mathematical details for Durbin-Watson and Breusch-Godfrey + +Suppose the observations have a meaningful order, +indexed by $t = 1, \ldots, n$. +Let $e_t$ denote the OLS residual from a fitted regression model. + +For the Durbin-Watson diagnostic, +the test statistic is +$$ +\ba +d +&= \frac{\sum_{t = 2}^{n} (e_t - e_{t-1})^2} +{\sum_{t = 1}^{n} e_t^2} +\ea +$$ +which is approximately +$2(1 - \hat r_1)$, +where $\hat r_1$ is the sample lag-1 residual autocorrelation +[@kutner2005applied, Chapter 12; @chatterjee2015regression, Chapter 6]. + +The Durbin-Watson null and alternatives are typically framed through +an AR(1) error model: +$$ +\err_t = \rho \err_{t-1} + u_t. +$$ +The null is $H_0: \rho = 0$, +and alternatives can be one-sided or two-sided, +depending on whether we suspect positive, +negative, +or any serial correlation +[@kutner2005applied, Chapter 12]. + +For the Breusch-Godfrey test of order $p$, +we run an auxiliary regression of residuals on: +the original regressors, +and lagged residuals $e_{t-1}, \ldots, e_{t-p}$. +If $R^2_{\text{aux}}$ is from that auxiliary model, +the LM statistic is +$$ +\text{LM} = (n - p)R^2_{\text{aux}}, +$$ +and under +$H_0: \rho_1 = \cdots = \rho_p = 0$, +it is asymptotically +$\chi^2_p$ +[@chatterjee2015regression, Chapter 6; @draper2014applied, Chapter 11]. + From 5969608cbe4c12913632a8c190fe81e82d2a8c13 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 03:00:38 +0000 Subject: [PATCH 04/14] docs: use subfiles for independence test math and examples Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/6bcdf40e-f97e-4fde-a0d3-49f15177becd Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_independence_examples.qmd | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd index 285705754..a72879784 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd @@ -64,13 +64,10 @@ Durbin-Watson and Breusch-Godfrey tests: ```{r} #| label: exm-independence-formal-tests #| code-fold: false -library(lmtest) - -dwtest(indep_exm_lm, alternative = "two.sided") -bgtest(indep_exm_lm, order = 4) +lmtest::dwtest(indep_exm_lm, alternative = "two.sided") +lmtest::bgtest(indep_exm_lm, order = 4) ``` In this example, the residual-order plot and correlogram suggest serial dependence, and both formal tests typically reject independence. - From 360d4eb86208e1f00329c2933499596497087b25 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 03:02:42 +0000 Subject: [PATCH 05/14] docs: clarify AR error notation and dependent-example naming Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/6bcdf40e-f97e-4fde-a0d3-49f15177becd Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_independence_examples.qmd | 18 +++++++++--------- .../_sec_linreg_independence_math.qmd | 6 ++++-- 2 files changed, 13 insertions(+), 11 deletions(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd index a72879784..1bcef6e9c 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd @@ -14,15 +14,15 @@ x <- seq(-1, 1, length.out = n_obs) err <- as.numeric(arima.sim(model = list(ar = 0.7), n = n_obs, sd = 1)) y <- 2 + 1.5 * x + err -indep_exm <- tibble::tibble( +serial_dep_exm <- tibble::tibble( time_index = time_index, x = x, y = y ) -indep_exm_lm <- lm(y ~ x, data = indep_exm) +serial_dep_exm_lm <- lm(y ~ x, data = serial_dep_exm) -summary(indep_exm_lm) +summary(serial_dep_exm_lm) ``` Residuals versus order: @@ -30,8 +30,8 @@ Residuals versus order: ```{r} #| label: fig-independence-resid-order #| fig-cap: "Residuals versus observation order for simulated data" -indep_exm |> - dplyr::mutate(resid = resid(indep_exm_lm)) |> +serial_dep_exm |> + dplyr::mutate(resid = resid(serial_dep_exm_lm)) |> ggplot(aes(x = time_index, y = resid)) + geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") + geom_line(color = "steelblue") + @@ -49,11 +49,11 @@ Correlogram diagnostics: #| fig-cap: "Residual ACF and PACF for simulated data" op <- par(mfrow = c(1, 2)) acf( - resid(indep_exm_lm), + resid(serial_dep_exm_lm), main = "Residual ACF" ) pacf( - resid(indep_exm_lm), + resid(serial_dep_exm_lm), main = "Residual PACF" ) par(op) @@ -64,8 +64,8 @@ Durbin-Watson and Breusch-Godfrey tests: ```{r} #| label: exm-independence-formal-tests #| code-fold: false -lmtest::dwtest(indep_exm_lm, alternative = "two.sided") -lmtest::bgtest(indep_exm_lm, order = 4) +lmtest::dwtest(serial_dep_exm_lm, alternative = "two.sided") +lmtest::bgtest(serial_dep_exm_lm, order = 4) ``` In this example, diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd index ebb2c0cd7..bb771839b 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd @@ -21,8 +21,11 @@ where $\hat r_1$ is the sample lag-1 residual autocorrelation The Durbin-Watson null and alternatives are typically framed through an AR(1) error model: $$ -\err_t = \rho \err_{t-1} + u_t. +\eps_t = \rho \eps_{t-1} + u_t. $$ +Here, +$\eps_t$ is the autocorrelated regression error process, +and $u_t$ is a white-noise innovation term. The null is $H_0: \rho = 0$, and alternatives can be one-sided or two-sided, depending on whether we suspect positive, @@ -44,4 +47,3 @@ $H_0: \rho_1 = \cdots = \rho_p = 0$, it is asymptotically $\chi^2_p$ [@chatterjee2015regression, Chapter 6; @draper2014applied, Chapter 11]. - From 017b84acceedd98bf758e903d6fe71c7ee47a4b3 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 03:04:37 +0000 Subject: [PATCH 06/14] docs: tighten independence diagnostics explanations Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/6bcdf40e-f97e-4fde-a0d3-49f15177becd Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_independence_examples.qmd | 4 +++- .../Linear-models-overview/_sec_linreg_independence_math.qmd | 3 +++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd index 1bcef6e9c..0bf0d4aff 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd @@ -2,6 +2,8 @@ The example below simulates ordered data with positively autocorrelated errors. That creates a setting where independence should fail. +We generate the error term from an AR(1) process with coefficient 0.7, +matching the notation in the mathematical details section. ```{r} #| label: exm-independence-sim-data @@ -70,4 +72,4 @@ lmtest::bgtest(serial_dep_exm_lm, order = 4) In this example, the residual-order plot and correlogram suggest serial dependence, -and both formal tests typically reject independence. +and both formal tests reject independence. diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd index bb771839b..09e605c67 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd @@ -17,6 +17,9 @@ which is approximately $2(1 - \hat r_1)$, where $\hat r_1$ is the sample lag-1 residual autocorrelation [@kutner2005applied, Chapter 12; @chatterjee2015regression, Chapter 6]. +This approximation is most useful in standard large-sample linear-model settings, +and is less straightforward in models with lagged outcomes +[@kutner2005applied, Chapter 12]. The Durbin-Watson null and alternatives are typically framed through an AR(1) error model: From 3661078c288ecf2f8c1dd1a04267be7626a3e433 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 03:06:27 +0000 Subject: [PATCH 07/14] docs: make correlogram par reset robust Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/6bcdf40e-f97e-4fde-a0d3-49f15177becd Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_independence_examples.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd index 0bf0d4aff..ea61832a4 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd @@ -50,6 +50,7 @@ Correlogram diagnostics: #| label: fig-independence-correlogram #| fig-cap: "Residual ACF and PACF for simulated data" op <- par(mfrow = c(1, 2)) +on.exit(par(op), add = TRUE) acf( resid(serial_dep_exm_lm), main = "Residual ACF" @@ -58,7 +59,6 @@ pacf( resid(serial_dep_exm_lm), main = "Residual PACF" ) -par(op) ``` Durbin-Watson and Breusch-Godfrey tests: From e5083cdce65d4cbbbe9466f77d74c37864dd1aa9 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 03:08:20 +0000 Subject: [PATCH 08/14] docs: rename simulated AR error variable for clarity Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/6bcdf40e-f97e-4fde-a0d3-49f15177becd Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_independence_examples.qmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd index ea61832a4..935b2ec6b 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd @@ -13,8 +13,8 @@ set.seed(204) n_obs <- 120 time_index <- seq_len(n_obs) x <- seq(-1, 1, length.out = n_obs) -err <- as.numeric(arima.sim(model = list(ar = 0.7), n = n_obs, sd = 1)) -y <- 2 + 1.5 * x + err +ar_errors <- as.numeric(arima.sim(model = list(ar = 0.7), n = n_obs, sd = 1)) +y <- 2 + 1.5 * x + ar_errors serial_dep_exm <- tibble::tibble( time_index = time_index, From b00d5d37d99ee71609602170d392c897d3d773ba Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 03:10:06 +0000 Subject: [PATCH 09/14] docs: preserve full graphics state in correlogram example Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/6bcdf40e-f97e-4fde-a0d3-49f15177becd Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_independence_examples.qmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd index 935b2ec6b..0ed3cd7a1 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd @@ -49,7 +49,8 @@ Correlogram diagnostics: ```{r} #| label: fig-independence-correlogram #| fig-cap: "Residual ACF and PACF for simulated data" -op <- par(mfrow = c(1, 2)) +op <- par(no.readonly = TRUE) +par(mfrow = c(1, 2)) on.exit(par(op), add = TRUE) acf( resid(serial_dep_exm_lm), From a6673aea0e8eef026ae99ff4a1e3534a16d5a9a3 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 03:11:58 +0000 Subject: [PATCH 10/14] docs: simplify time index setup in independence example Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/6bcdf40e-f97e-4fde-a0d3-49f15177becd Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_independence_examples.qmd | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd index 0ed3cd7a1..182150a66 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd @@ -11,13 +11,12 @@ matching the notation in the mathematical details section. set.seed(204) n_obs <- 120 -time_index <- seq_len(n_obs) x <- seq(-1, 1, length.out = n_obs) ar_errors <- as.numeric(arima.sim(model = list(ar = 0.7), n = n_obs, sd = 1)) y <- 2 + 1.5 * x + ar_errors serial_dep_exm <- tibble::tibble( - time_index = time_index, + time_index = seq_len(n_obs), x = x, y = y ) @@ -50,8 +49,8 @@ Correlogram diagnostics: #| label: fig-independence-correlogram #| fig-cap: "Residual ACF and PACF for simulated data" op <- par(no.readonly = TRUE) -par(mfrow = c(1, 2)) on.exit(par(op), add = TRUE) +par(mfrow = c(1, 2)) acf( resid(serial_dep_exm_lm), main = "Residual ACF" From 8143f6e37df99e91734e709baf8d764e70b16037 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 23:36:24 +0000 Subject: [PATCH 11/14] docs: add real-data independence diagnostics examples Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/66346382-6b8d-4b3b-8ec2-71de2d6d2c90 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_independence_examples.qmd | 57 +++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd index 182150a66..70342423f 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd @@ -73,3 +73,60 @@ lmtest::bgtest(serial_dep_exm_lm, order = 4) In this example, the residual-order plot and correlogram suggest serial dependence, and both formal tests reject independence. + +#### Real-data examples + +We can also run the same diagnostics on real ordered data sets. + +##### Nile annual river flow + +```{r} +#| label: exm-independence-real-nile +#| code-fold: false +nile_df <- tibble::tibble( + year = as.numeric(time(datasets::Nile)), + flow = as.numeric(datasets::Nile) +) + +nile_lm <- lm(flow ~ year, data = nile_df) + +summary(nile_lm) +lmtest::dwtest(nile_lm, alternative = "two.sided") +lmtest::bgtest(nile_lm, order = 4) +``` + +```{r} +#| label: fig-independence-real-nile-resid-order +#| fig-cap: "Residuals versus year for Nile flow model" +nile_df |> + dplyr::mutate(resid = resid(nile_lm)) |> + ggplot(aes(x = year, y = resid)) + + geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") + + geom_line(color = "steelblue") + + theme_classic() + + labs( + x = "Year", + y = "Residual" + ) +``` + +##### Mauna Loa atmospheric carbon dioxide (`co2`) + +```{r} +#| label: exm-independence-real-co2 +#| code-fold: false +co2_df <- tibble::tibble( + decimal_year = as.numeric(time(datasets::co2)), + co2_ppm = as.numeric(datasets::co2) +) + +co2_lm <- lm(co2_ppm ~ decimal_year, data = co2_df) + +summary(co2_lm) +lmtest::dwtest(co2_lm, alternative = "two.sided") +lmtest::bgtest(co2_lm, order = 12) +``` + +For both real-data examples, +the diagnostics indicate residual dependence, +consistent with their time-ordered structure. From f6b5e5fa28bd99810c23813d5ef4789c9d232cc7 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 18 Apr 2026 23:55:13 +0000 Subject: [PATCH 12/14] docs: replace wikipedia primary citations with textbook sources Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/e54eadb3-a6c1-44ac-a1b5-e645fdfc17ed Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_diagnostics.qmd | 8 ++--- references.bib | 32 ------------------- 2 files changed, 4 insertions(+), 36 deletions(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd index fafbc5b22..31a85e859 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd @@ -16,7 +16,7 @@ after conditioning on the predictors in the model. For data with a natural ordering (for example, time, spatial location, or clinic visit sequence), we usually assess independence with both plots and formal tests -[@kutner2005applied, Chapter 12; @chatterjee2015regression, Chapter 6; @wp:regvalidation]. +[@kutner2005applied, Chapter 12; @chatterjee2015regression, Chapter 6]. Common diagnostics include: @@ -28,16 +28,16 @@ patterns, runs, or drifts can indicate dependence plot the sample autocorrelation function (ACF), and often the partial autocorrelation function (PACF), to look for serial structure -[@wp:correlogram; @chatterjee2015regression, Chapter 6]. +[@chatterjee2015regression, Chapter 6]. 3. Durbin-Watson test: tests for first-order serial correlation in regression residuals -[@wp:durbinwatson]. +[@chatterjee2015regression, Chapter 9; @kutner2005applied, Chapter 12]. 4. Breusch-Godfrey test: tests for higher-order serial correlation, and is more flexible than Durbin-Watson in many regression settings -[@wp:breuschgodfrey]. +[@chatterjee2015regression, Chapter 9; @kutner2005applied, Chapter 12]. {{< include _subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd >}} diff --git a/references.bib b/references.bib index bf67fa998..3bb731149 100644 --- a/references.bib +++ b/references.bib @@ -1365,38 +1365,6 @@ @misc{wp:statinference note = "[Online; accessed 7-September-2025]" } - @misc{wp:regvalidation, - author = "{Wikipedia contributors}", - title = "Regression validation --- {Wikipedia}{,} The Free Encyclopedia", - year = "2026", - url = "https://en.wikipedia.org/wiki/Regression_validation", - note = "[Online; accessed 18-April-2026]" - } - - @misc{wp:durbinwatson, - author = "{Wikipedia contributors}", - title = "Durbin--Watson statistic --- {Wikipedia}{,} The Free Encyclopedia", - year = "2026", - url = "https://en.wikipedia.org/wiki/Durbin%E2%80%93Watson_statistic", - note = "[Online; accessed 18-April-2026]" - } - - @misc{wp:correlogram, - author = "{Wikipedia contributors}", - title = "Correlogram --- {Wikipedia}{,} The Free Encyclopedia", - year = "2026", - url = "https://en.wikipedia.org/wiki/Correlogram", - note = "[Online; accessed 18-April-2026]" - } - - @misc{wp:breuschgodfrey, - author = "{Wikipedia contributors}", - title = "Breusch--Godfrey test --- {Wikipedia}{,} The Free Encyclopedia", - year = "2026", - url = "https://en.wikipedia.org/wiki/Breusch%E2%80%93Godfrey_test", - note = "[Online; accessed 18-April-2026]" - } - @article{heinze2018variable, title={Variable selection -- {A} review and recommendations for the practicing statistician}, author={Heinze, Georg and Wallisch, Christine and Dunkler, Daniela}, From d98ff694ed8007a752a7ce686877f85820af612c Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sun, 19 Apr 2026 00:28:53 +0000 Subject: [PATCH 13/14] docs: apply remaining review-thread fixes for independence diagnostics Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/8530f689-5c1f-420b-805b-a3dfc5943256 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_diagnostics.qmd | 7 ++++-- .../_sec_linreg_independence_examples.qmd | 24 ++++++++++--------- 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd index 31a85e859..1c2e1012f 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd @@ -10,8 +10,11 @@ This section is adapted from @dobson4e [§6.2-6.3] and ### Diagnostics for the independence assumption -The independence assumption means that residual noise terms are not correlated across observations, -after conditioning on the predictors in the model. +The independence assumption means that the model errors are independent, +or at least uncorrelated, +across observations after conditioning on the predictors in the model. +Because those errors are unobserved, +we usually assess this assumption using residual-based plots and formal tests. For data with a natural ordering (for example, time, spatial location, or clinic visit sequence), diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd index 70342423f..fbe8574e7 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd @@ -48,17 +48,19 @@ Correlogram diagnostics: ```{r} #| label: fig-independence-correlogram #| fig-cap: "Residual ACF and PACF for simulated data" -op <- par(no.readonly = TRUE) -on.exit(par(op), add = TRUE) -par(mfrow = c(1, 2)) -acf( - resid(serial_dep_exm_lm), - main = "Residual ACF" -) -pacf( - resid(serial_dep_exm_lm), - main = "Residual PACF" -) +local({ + op <- par(no.readonly = TRUE) + on.exit(par(op), add = TRUE) + par(mfrow = c(1, 2)) + acf( + resid(serial_dep_exm_lm), + main = "Residual ACF" + ) + pacf( + resid(serial_dep_exm_lm), + main = "Residual PACF" + ) +}) ``` Durbin-Watson and Breusch-Godfrey tests: From 26c428fedd9f2deff35afe6e9aa265da58e47a2d Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Mon, 20 Apr 2026 07:54:50 +0000 Subject: [PATCH 14/14] docs: use theorem-variant divs in independence diagnostics section Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/af609840-ccc3-42e1-b230-46a09d70378b Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_diagnostics.qmd | 4 +++- .../_sec_linreg_independence_examples.qmd | 8 ++++++-- .../_sec_linreg_independence_math.qmd | 15 +++++++++++---- 3 files changed, 20 insertions(+), 7 deletions(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd index 1c2e1012f..a51beb2b9 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd @@ -21,7 +21,8 @@ For data with a natural ordering we usually assess independence with both plots and formal tests [@kutner2005applied, Chapter 12; @chatterjee2015regression, Chapter 6]. -Common diagnostics include: +:::{#def-independence-diagnostics} +#### Common diagnostics for independence 1. Residuals versus observation order (or time) plots: patterns, runs, or drifts can indicate dependence @@ -41,6 +42,7 @@ tests for first-order serial correlation in regression residuals tests for higher-order serial correlation, and is more flexible than Durbin-Watson in many regression settings [@chatterjee2015regression, Chapter 9; @kutner2005applied, Chapter 12]. +::: {{< include _subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd >}} diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd index fbe8574e7..15e053d3f 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd @@ -1,4 +1,5 @@ -#### Example diagnostics with code and output +:::{#exm-independence-diagnostics-simulated} +#### Simulated example for independence diagnostics The example below simulates ordered data with positively autocorrelated errors. That creates a setting where independence should fail. @@ -75,8 +76,10 @@ lmtest::bgtest(serial_dep_exm_lm, order = 4) In this example, the residual-order plot and correlogram suggest serial dependence, and both formal tests reject independence. +::: -#### Real-data examples +:::{#exm-independence-diagnostics-real-data} +#### Real-data examples for independence diagnostics We can also run the same diagnostics on real ordered data sets. @@ -132,3 +135,4 @@ lmtest::bgtest(co2_lm, order = 12) For both real-data examples, the diagnostics indicate residual dependence, consistent with their time-ordered structure. +::: diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd index 09e605c67..100b73e01 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd @@ -4,8 +4,10 @@ Suppose the observations have a meaningful order, indexed by $t = 1, \ldots, n$. Let $e_t$ denote the OLS residual from a fitted regression model. -For the Durbin-Watson diagnostic, -the test statistic is +:::{#def-durbin-watson-diagnostic} +#### Durbin-Watson diagnostic + +The test statistic is $$ \ba d @@ -21,7 +23,7 @@ This approximation is most useful in standard large-sample linear-model settings and is less straightforward in models with lagged outcomes [@kutner2005applied, Chapter 12]. -The Durbin-Watson null and alternatives are typically framed through +The null and alternatives are typically framed through an AR(1) error model: $$ \eps_t = \rho \eps_{t-1} + u_t. @@ -35,8 +37,12 @@ depending on whether we suspect positive, negative, or any serial correlation [@kutner2005applied, Chapter 12]. +::: + +:::{#def-breusch-godfrey-diagnostic} +#### Breusch-Godfrey diagnostic -For the Breusch-Godfrey test of order $p$, +For a test of order $p$, we run an auxiliary regression of residuals on: the original regressors, and lagged residuals $e_{t-1}, \ldots, e_{t-p}$. @@ -50,3 +56,4 @@ $H_0: \rho_1 = \cdots = \rho_p = 0$, it is asymptotically $\chi^2_p$ [@chatterjee2015regression, Chapter 6; @draper2014applied, Chapter 11]. +:::