d-morrison · d-morrison · Apr 28, 2026 · Apr 18, 2026 · Apr 18, 2026 · Apr 18, 2026
diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd
@@ -8,6 +8,52 @@ This section is adapted from @dobson4e [§6.2-6.3] and
 
 {{< include _subfiles/Linear-models-overview/_sec-linreg-assumptions.qmd >}}
 
+### Diagnostics for the independence assumption
+
+The independence assumption means that the model errors are independent,
+or at least uncorrelated,
+across observations after conditioning on the predictors in the model.
+Because those errors are unobserved,
+we usually assess this assumption using residual-based plots and formal tests.
+
+For data with a natural ordering
+(for example, time, spatial location, or clinic visit sequence),
+we usually assess independence with both plots and formal tests
+[@kutner2005applied, Chapter 12; @chatterjee2015regression, Chapter 6].
+
+:::{#def-independence-diagnostics}
+#### Common diagnostics for independence
+
+1. Residuals versus observation order (or time) plots:
+patterns, runs, or drifts can indicate dependence
+[@kutner2005applied, Chapter 12].
+
+2. Correlograms:
+plot the sample autocorrelation function (ACF),
+and often the partial autocorrelation function (PACF),
+to look for serial structure
+[@chatterjee2015regression, Chapter 6].
+
+3. Durbin-Watson test:
+tests for first-order serial correlation in regression residuals
+[@chatterjee2015regression, Chapter 9; @kutner2005applied, Chapter 12].
+
+4. Breusch-Godfrey test:
+tests for higher-order serial correlation,
+and is more flexible than Durbin-Watson in many regression settings
+[@chatterjee2015regression, Chapter 9; @kutner2005applied, Chapter 12].
+:::
+
+{{< include _subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd >}}
+
+{{< include _subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd >}}
+
+No single diagnostic is definitive.
+In practice,
+we combine visual and formal diagnostics,
+and interpret them in the context of study design
+[@kutner2005applied, Chapter 12; @draper2014applied, Chapter 11].
+
 ### Direct visualization
 
 ::: notes

diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_examples.qmd
@@ -0,0 +1,138 @@
+:::{#exm-independence-diagnostics-simulated}
+#### Simulated example for independence diagnostics
+
+The example below simulates ordered data with positively autocorrelated errors.
+That creates a setting where independence should fail.
+We generate the error term from an AR(1) process with coefficient 0.7,
+matching the notation in the mathematical details section.
+
+```{r}
+#| label: exm-independence-sim-data
+#| code-fold: false
+set.seed(204)
+
+n_obs <- 120
+x <- seq(-1, 1, length.out = n_obs)
+ar_errors <- as.numeric(arima.sim(model = list(ar = 0.7), n = n_obs, sd = 1))
+y <- 2 + 1.5 * x + ar_errors
+
+serial_dep_exm <- tibble::tibble(
+  time_index = seq_len(n_obs),
+  x = x,
+  y = y
+)
+
+serial_dep_exm_lm <- lm(y ~ x, data = serial_dep_exm)
+
+summary(serial_dep_exm_lm)
+```
+
+Residuals versus order:
+
+```{r}
+#| label: fig-independence-resid-order
+#| fig-cap: "Residuals versus observation order for simulated data"
+serial_dep_exm |>
+  dplyr::mutate(resid = resid(serial_dep_exm_lm)) |>
+  ggplot(aes(x = time_index, y = resid)) +
+  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
+  geom_line(color = "steelblue") +
+  theme_classic() +
+  labs(
+    x = "Observation order",
+    y = "Residual"
+  )
+```
+
+Correlogram diagnostics:
+
+```{r}
+#| label: fig-independence-correlogram
+#| fig-cap: "Residual ACF and PACF for simulated data"
+local({
+  op <- par(no.readonly = TRUE)
+  on.exit(par(op), add = TRUE)
+  par(mfrow = c(1, 2))
+  acf(
+    resid(serial_dep_exm_lm),
+    main = "Residual ACF"
+  )
+  pacf(
+    resid(serial_dep_exm_lm),
+    main = "Residual PACF"
+  )
+})
+```
+
+Durbin-Watson and Breusch-Godfrey tests:
+
+```{r}
+#| label: exm-independence-formal-tests
+#| code-fold: false
+lmtest::dwtest(serial_dep_exm_lm, alternative = "two.sided")
+lmtest::bgtest(serial_dep_exm_lm, order = 4)
+```
+
+In this example,
+the residual-order plot and correlogram suggest serial dependence,
+and both formal tests reject independence.
+:::
+
+:::{#exm-independence-diagnostics-real-data}
+#### Real-data examples for independence diagnostics
+
+We can also run the same diagnostics on real ordered data sets.
+
+##### Nile annual river flow
+
+```{r}
+#| label: exm-independence-real-nile
+#| code-fold: false
+nile_df <- tibble::tibble(
+  year = as.numeric(time(datasets::Nile)),
+  flow = as.numeric(datasets::Nile)
+)
+
+nile_lm <- lm(flow ~ year, data = nile_df)
+
+summary(nile_lm)
+lmtest::dwtest(nile_lm, alternative = "two.sided")
+lmtest::bgtest(nile_lm, order = 4)
+```
+
+```{r}
+#| label: fig-independence-real-nile-resid-order
+#| fig-cap: "Residuals versus year for Nile flow model"
+nile_df |>
+  dplyr::mutate(resid = resid(nile_lm)) |>
+  ggplot(aes(x = year, y = resid)) +
+  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
+  geom_line(color = "steelblue") +
+  theme_classic() +
+  labs(
+    x = "Year",
+    y = "Residual"
+  )
+```
+
+##### Mauna Loa atmospheric carbon dioxide (`co2`)
+
+```{r}
+#| label: exm-independence-real-co2
+#| code-fold: false
+co2_df <- tibble::tibble(
+  decimal_year = as.numeric(time(datasets::co2)),
+  co2_ppm = as.numeric(datasets::co2)
+)
+
+co2_lm <- lm(co2_ppm ~ decimal_year, data = co2_df)
+
+summary(co2_lm)
+lmtest::dwtest(co2_lm, alternative = "two.sided")
+lmtest::bgtest(co2_lm, order = 12)
+```
+
+For both real-data examples,
+the diagnostics indicate residual dependence,
+consistent with their time-ordered structure.
+:::
diff --git a/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd b/_subfiles/Linear-models-overview/_sec_linreg_independence_math.qmd
@@ -0,0 +1,59 @@
+#### Mathematical details for Durbin-Watson and Breusch-Godfrey
+
+Suppose the observations have a meaningful order,
+indexed by $t = 1, \ldots, n$.
+Let $e_t$ denote the OLS residual from a fitted regression model.
+
+:::{#def-durbin-watson-diagnostic}
+#### Durbin-Watson diagnostic
+
+The test statistic is
+$$
+\ba
+d
+&= \frac{\sum_{t = 2}^{n} (e_t - e_{t-1})^2}
+{\sum_{t = 1}^{n} e_t^2}
+\ea
+$$
+which is approximately
+$2(1 - \hat r_1)$,
+where $\hat r_1$ is the sample lag-1 residual autocorrelation
+[@kutner2005applied, Chapter 12; @chatterjee2015regression, Chapter 6].
+This approximation is most useful in standard large-sample linear-model settings,
+and is less straightforward in models with lagged outcomes
+[@kutner2005applied, Chapter 12].
+
+The null and alternatives are typically framed through
+an AR(1) error model:
+$$
+\eps_t = \rho \eps_{t-1} + u_t.
+$$
+Here,
+$\eps_t$ is the autocorrelated regression error process,
+and $u_t$ is a white-noise innovation term.
+The null is $H_0: \rho = 0$,
+and alternatives can be one-sided or two-sided,
+depending on whether we suspect positive,
+negative,
+or any serial correlation
+[@kutner2005applied, Chapter 12].
+:::
+
+:::{#def-breusch-godfrey-diagnostic}
+#### Breusch-Godfrey diagnostic
+
+For a test of order $p$,
+we run an auxiliary regression of residuals on:
+the original regressors,
+and lagged residuals $e_{t-1}, \ldots, e_{t-p}$.
+If $R^2_{\text{aux}}$ is from that auxiliary model,
+the LM statistic is
+$$
+\text{LM} = (n - p)R^2_{\text{aux}},
+$$
+and under
+$H_0: \rho_1 = \cdots = \rho_p = 0$,
+it is asymptotically
+$\chi^2_p$
+[@chatterjee2015regression, Chapter 6; @draper2014applied, Chapter 11].
+:::