From 5078c8f2b4450251e28793fc22215d0fe6214611 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 16 Apr 2026 00:04:36 +0000 Subject: [PATCH 01/76] Initial plan From 704f50d7c09f5f2804d474499a69c3c592cd4502 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 16 Apr 2026 00:28:28 +0000 Subject: [PATCH 02/76] Clarify deviation, error/noise, and residual terminology Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/b8287ff9-c2a9-44fd-a68d-9100617481b2 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_def-residual-deviation.qmd | 12 ++-- .../_sec_linreg_diagnostics.qmd | 2 +- estimation.qmd | 58 +++++++++++++++++-- probability.qmd | 52 +++++++++++++++++ 4 files changed, 111 insertions(+), 13 deletions(-) diff --git a/_subfiles/Linear-models-overview/_def-residual-deviation.qmd b/_subfiles/Linear-models-overview/_def-residual-deviation.qmd index e11cd46ed..2dc9d0ac1 100644 --- a/_subfiles/Linear-models-overview/_def-residual-deviation.qmd +++ b/_subfiles/Linear-models-overview/_def-residual-deviation.qmd @@ -1,17 +1,17 @@ :::{#def-resid-noise} -#### Residual noise/deviation from the population mean +#### Model error/noise (deviation from the population mean) -The **residual noise** in a probabilistic model $p(Y)$, -also known as the -**residual deviation of an observation from its population mean** -or **residual** for short, +The **model error/noise** in a probabilistic model $p(Y)$, +also known as the +**deviation of an observation from its population mean**, is the difference between an observed value $y$ and its population mean: $$\devn(y) \eqdef y - \Expf{Y}$$ {#eq-def-resid} ::: :::{.notes} -We use the same notation for residual noise that we used for [errors](estimation.qmd#def-error). +We use the same notation for model error/noise +that we used for [errors](probability.qmd#def-error). $\Expf{Y}$ can be viewed as an estimate of $Y$, before $y$ is observed. Conversely, each observation $y$ can be viewed as an estimate of $\Expf{Y}$ diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd index 50f1ed3a0..ce4d2d849 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd @@ -195,7 +195,7 @@ Left to the reader. :::{#def-resid-fitted} #### Residuals of a fitted model value -The **residual of a fitted value $\hat y$** (shorthand: "residual") is its [error](estimation.qmd#def-error) relative to the observed data: +The **residual of a fitted value $\hat y$** (shorthand: "residual") is its [error](probability.qmd#def-error) relative to the observed data: $$ \ba e(\hat y) &\eqdef \erf{\hat y} diff --git a/estimation.qmd b/estimation.qmd index a8938e933..3fcbd248a 100644 --- a/estimation.qmd +++ b/estimation.qmd @@ -185,17 +185,63 @@ to the distributions of the *errors* made by the resulting estimates. ::: -## Error +## Estimation error -::: {#def-error} -#### Error +::: {#def-estimation-error} +#### Estimation error -The **error** of an estimate $\hth$ of a true value $\th$, often denoted $\eps(\hth)$, or more completely $\eps(\hth, \th)$, is the -difference between the estimate and its estimand $\theta$; that is: +The **estimation error** of an estimate $\hth$ of a true value $\th$ +is the [error](probability.qmd#def-error) +obtained when the value is $\hth$ +and the reference value is the estimand $\th$: $$\eps(\hth) \eqdef \hth - \th$$ ::: +## Residuals + +::: {#def-residual} +#### Residual + +A **residual** is the difference between an observed value +and its fitted value: + +$$e_i \eqdef y_i - \hat y_i$$ + +The fitted value $\hat y_i$ is often a sample mean or fitted conditional mean, +but not always. +::: + +## Relationship between residuals and errors + +Let $\mu_i \eqdef \E{Y_i \mid X_i}$ denote the model-implied mean. +Then: + +$$ +\ba +e_i +&= y_i - \hat y_i\\ +&= \paren{y_i - \mu_i} - \paren{\hat y_i - \mu_i} +\ea +$$ + +So a residual is an observed approximation to an unobserved model error: +it equals the error term $(y_i-\mu_i)$, +minus fitted-value estimation error $(\hat y_i-\mu_i)$. + +Different sources are not fully consistent about these terms. +In this course, we use: + +- **deviation** for a generic difference from a reference value; +- **error/noise** for deviation from a population mean; +- **residual** for deviation from a fitted value. + +See: + +- [Wikipedia: Errors and residuals](https://en.wikipedia.org/wiki/Errors_and_residuals) +- [Wikipedia: Deviation (statistics)](https://en.wikipedia.org/wiki/Deviation_(statistics)) +- [Wikipedia: Linear regression — Notation and terminology](https://en.wikipedia.org/wiki/Linear_regression#Notation_and_terminology) + Some frequently-used measures of accuracy include: ## Mean squared error @@ -351,7 +397,7 @@ $$\SE{\hth} \eqdef \SD{\hth}$$ ::: -"Standard error" is a confusing concept in a few ways. First of all, it isn't even defined as a characteristic of the [error](#def-error), $\eps(\hth)$! Moreover, it is just a synonym for standard deviation, so it seems like a redundant concept. However, standard errors help us construct p-values and confidence intervals, so they come up a lot - often enough to give them their own name. +"Standard error" is a confusing concept in a few ways. First of all, it isn't even defined as a characteristic of the [estimation error](#def-estimation-error), $\eps(\hth)$! Moreover, it is just a synonym for standard deviation, so it seems like a redundant concept. However, standard errors help us construct p-values and confidence intervals, so they come up a lot - often enough to give them their own name. We can relate standard error to actual error, by rearranging the result from @thm-mse-bias-variance: diff --git a/probability.qmd b/probability.qmd index bd9b2ed0c..88e6d5080 100644 --- a/probability.qmd +++ b/probability.qmd @@ -643,6 +643,58 @@ $$ --- +## Deviation, error, and noise + +:::{#def-deviation} +### Deviation + +A **deviation** is the difference between a value and a reference value. +For any quantity $z$ and reference value $r$: + +$$z - r$$ + +In probability and statistics, +"deviation" often means deviation from a population mean. +For a random variable $Y$: + +$$Y - \E{Y}$$ + +::: + +--- + +:::{#def-error} +### Error/noise in probabilistic models + +In probabilistic models, +an **error** (or **noise term**) +is a deviation of an observation from its model-implied population mean. +For an observation $y$: + +$$\err(y) \eqdef y - \E{Y}$$ + +In regression settings, +the reference mean is often conditional on covariates: +$\err_i \eqdef y_i - \E{Y_i \mid X_i}$. + +The terms "error" and "noise" are common, +but terminology is not fully consistent across sources. +In this course, +we use "residual" for deviations from fitted values +(defined in the +[Estimation chapter](estimation.qmd#def-residual)), +not for deviations from population means. + +See: + +- [Wikipedia: Errors and residuals](https://en.wikipedia.org/wiki/Errors_and_residuals) +- [Wikipedia: Deviation (statistics)](https://en.wikipedia.org/wiki/Deviation_(statistics)) +- [Wikipedia: Linear regression — Notation and terminology](https://en.wikipedia.org/wiki/Linear_regression#Notation_and_terminology) + +::: + +--- + ## Variance and related characteristics :::{#def-variance} From bad929de126fd8300501a01ffdd60ae05282916d Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 16 Apr 2026 00:51:09 +0000 Subject: [PATCH 03/76] Refine terminology links and consistency for error vs residual definitions Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/b8287ff9-c2a9-44fd-a68d-9100617481b2 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_def-residual-deviation.qmd | 11 ++++++----- .../_sec_linreg_diagnostics.qmd | 5 +++-- estimation.qmd | 12 ++++++------ probability.qmd | 6 +++--- 4 files changed, 18 insertions(+), 16 deletions(-) diff --git a/_subfiles/Linear-models-overview/_def-residual-deviation.qmd b/_subfiles/Linear-models-overview/_def-residual-deviation.qmd index 2dc9d0ac1..210b277a9 100644 --- a/_subfiles/Linear-models-overview/_def-residual-deviation.qmd +++ b/_subfiles/Linear-models-overview/_def-residual-deviation.qmd @@ -1,4 +1,4 @@ -:::{#def-resid-noise} +:::{#def-model-error-noise} #### Model error/noise (deviation from the population mean) The **model error/noise** in a probabilistic model $p(Y)$, @@ -6,12 +6,13 @@ also known as the **deviation of an observation from its population mean**, is the difference between an observed value $y$ and its population mean: -$$\devn(y) \eqdef y - \Expf{Y}$$ {#eq-def-resid} +$$\devn(y) \eqdef y - \Expf{Y}$$ {#eq-def-model-error} ::: :::{.notes} We use the same notation for model error/noise -that we used for [errors](probability.qmd#def-error). +that we used for [errors](probability.qmd#def-error): +$\devn(y)$. $\Expf{Y}$ can be viewed as an estimate of $Y$, before $y$ is observed. Conversely, each observation $y$ can be viewed as an estimate of $\Expf{Y}$ @@ -19,8 +20,8 @@ Conversely, each observation $y$ can be viewed as an estimate of $\Expf{Y}$ ::: -We can rearrange @eq-def-resid +We can rearrange @eq-def-model-error to view $y$ as -the sum of its mean plus the residual noise: +the sum of its mean plus the model error/noise: $$y = \Exp{Y} + \epsf{y}$$ diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd index ce4d2d849..4b7eb0106 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd @@ -195,7 +195,9 @@ Left to the reader. :::{#def-resid-fitted} #### Residuals of a fitted model value -The **residual of a fitted value $\hat y$** (shorthand: "residual") is its [error](probability.qmd#def-error) relative to the observed data: +The **residual of a fitted value $\hat y$** +(shorthand: "residual") +is its deviation from the observed data: $$ \ba e(\hat y) &\eqdef \erf{\hat y} @@ -1062,4 +1064,3 @@ bw |> theme_classic() + geom_hline(yintercept = 0) ``` - diff --git a/estimation.qmd b/estimation.qmd index 3fcbd248a..c91bbb11e 100644 --- a/estimation.qmd +++ b/estimation.qmd @@ -191,9 +191,8 @@ to the distributions of the *errors* made by the resulting estimates. #### Estimation error The **estimation error** of an estimate $\hth$ of a true value $\th$ -is the [error](probability.qmd#def-error) -obtained when the value is $\hth$ -and the reference value is the estimand $\th$: +is the difference between +the estimate and the estimand $\th$: $$\eps(\hth) \eqdef \hth - \th$$ ::: @@ -226,14 +225,15 @@ e_i $$ So a residual is an observed approximation to an unobserved model error: -it equals the error term $(y_i-\mu_i)$, -minus fitted-value estimation error $(\hat y_i-\mu_i)$. +it equals the model error $(y_i-\mu_i)$, +minus the estimation error of the fitted value $(\hat y_i-\mu_i)$. Different sources are not fully consistent about these terms. -In this course, we use: +For terminology in this course, we use: - **deviation** for a generic difference from a reference value; - **error/noise** for deviation from a population mean; +- **estimation error** for deviation of an estimate from its estimand; - **residual** for deviation from a fitted value. See: diff --git a/probability.qmd b/probability.qmd index 88e6d5080..819634cc2 100644 --- a/probability.qmd +++ b/probability.qmd @@ -680,9 +680,9 @@ $\err_i \eqdef y_i - \E{Y_i \mid X_i}$. The terms "error" and "noise" are common, but terminology is not fully consistent across sources. In this course, -we use "residual" for deviations from fitted values -(defined in the -[Estimation chapter](estimation.qmd#def-residual)), +we use "residual" +(defined in the [Estimation chapter](estimation.qmd#def-residual)) +for deviations from fitted values, not for deviations from population means. See: From 70dfc40a6d95fed100ee32acfacc0eab9bc9fb26 Mon Sep 17 00:00:00 2001 From: Douglas Ezra Morrison Date: Wed, 15 Apr 2026 21:25:03 -0700 Subject: [PATCH 04/76] Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- _subfiles/Linear-models-overview/_def-residual-deviation.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_subfiles/Linear-models-overview/_def-residual-deviation.qmd b/_subfiles/Linear-models-overview/_def-residual-deviation.qmd index 210b277a9..92f8f2443 100644 --- a/_subfiles/Linear-models-overview/_def-residual-deviation.qmd +++ b/_subfiles/Linear-models-overview/_def-residual-deviation.qmd @@ -24,4 +24,4 @@ We can rearrange @eq-def-model-error to view $y$ as the sum of its mean plus the model error/noise: -$$y = \Exp{Y} + \epsf{y}$$ +$$y = \Expf{Y} + \devn(y)$$ From 7d348a1ab2ec30dc82683df45c0de9c25cf3de00 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 16 Apr 2026 04:48:42 +0000 Subject: [PATCH 05/76] Apply erf vs devn notation distinction across estimation and probability chapters Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/d70c07dd-b06c-4360-ada3-9db26633fda8 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .github/copilot-instructions.md | 1 + .../_def-residual-deviation.qmd | 4 +++ estimation.qmd | 25 ++++++++++--------- probability.qmd | 8 ++++-- 4 files changed, 24 insertions(+), 14 deletions(-) diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index d7a868c91..5a3055226 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -115,6 +115,7 @@ Key macros to use: - **Aligned equations**: Use `\ba` / `\ea` for `\begin{aligned}` / `\end{aligned}` - **Greek letters**: Use `\b` for $\beta$, `\g` for $\gamma$, `\a` for $\alpha$ - **Formatting**: Use `\red{...}` and `\blue{...}` for colored text in math +- **Deviation/error notation**: Use `\erf{...}` for deviations of estimates/estimators from their estimands; use `\devn(...)` for all other deviations (e.g., observations from population means) Always check `latex-macros/macros.qmd` for available macros before writing raw LaTeX. diff --git a/_subfiles/Linear-models-overview/_def-residual-deviation.qmd b/_subfiles/Linear-models-overview/_def-residual-deviation.qmd index 92f8f2443..791550383 100644 --- a/_subfiles/Linear-models-overview/_def-residual-deviation.qmd +++ b/_subfiles/Linear-models-overview/_def-residual-deviation.qmd @@ -13,6 +13,10 @@ $$\devn(y) \eqdef y - \Expf{Y}$$ {#eq-def-model-error} We use the same notation for model error/noise that we used for [errors](probability.qmd#def-error): $\devn(y)$. +In this course, +we reserve $\erf{\cdot}$ for deviations of estimates +from their estimands +([estimation error](estimation.qmd#def-estimation-error)). $\Expf{Y}$ can be viewed as an estimate of $Y$, before $y$ is observed. Conversely, each observation $y$ can be viewed as an estimate of $\Expf{Y}$ diff --git a/estimation.qmd b/estimation.qmd index c91bbb11e..5bf2a3f74 100644 --- a/estimation.qmd +++ b/estimation.qmd @@ -194,7 +194,7 @@ The **estimation error** of an estimate $\hth$ of a true value $\th$ is the difference between the estimate and the estimand $\th$: -$$\eps(\hth) \eqdef \hth - \th$$ +$$\erf{\hth} \eqdef \hth - \th$$ ::: ## Residuals @@ -224,9 +224,10 @@ e_i \ea $$ -So a residual is an observed approximation to an unobserved model error: -it equals the model error $(y_i-\mu_i)$, -minus the estimation error of the fitted value $(\hat y_i-\mu_i)$. +So a residual is an observed approximation to an unobserved model deviation: +it equals the model deviation $(y_i-\mu_i)$, +minus the estimation error of the fitted value +$\erf{\hat y_i} \eqdef \hat y_i-\mu_i$. Different sources are not fully consistent about these terms. For terminology in this course, we use: @@ -252,7 +253,7 @@ Some frequently-used measures of accuracy include: The **mean squared error** of an estimator $\hth$, denoted $\mselr{\hth}$, is the expectation of the square of the error[^1]: -$$\mselr{\hth} \eqdef \E{(\err(\hth))^2}$$ +$$\mselr{\hth} \eqdef \E{\sqf{\erf{\hth}}}$$ ::: ## Mean absolute error @@ -264,7 +265,7 @@ The **mean absolute error** of an estimator is the expectation of the absolute value of the error: $$ -\maelr{\hth} \eqdef \E{\abs{\err(\hth)}} +\maelr{\hth} \eqdef \E{\abs{\erf{\hth}}} $$ ::: @@ -276,7 +277,7 @@ $$ The **bias** of an estimator $\hth$ for an estimand $\theta$ is the expected value of the error: -$$\bias{\hth} \eqdef \E{\err(\hth)}$$ {#eq-bias-def} +$$\bias{\hth} \eqdef \E{\erf{\hth}}$$ {#eq-bias-def} ::: --- @@ -293,7 +294,7 @@ $$\bias{\hth} =\E{\hth} - \theta$$ $$ \begin{aligned} \bias{\hth} -&\eqdef \E{\err(\hth)}\\ +&\eqdef \E{\erf{\hth}}\\ &= \E{\hth - \theta}\\ &=\E{\hth} - \E{\theta}\\ &=\E{\hth} - \theta @@ -361,7 +362,7 @@ $$ $$ \ba \mselr{\hth} -&\eqdef \E{\sqf{\eps(\hth)}}\\ +&\eqdef \E{\sqf{\erf{\hth}}}\\ &= \E{\sqf{\hth - \th}}\\ &= \E{\sqf{\hth - \E{\hth}}}\\ &\eqdef \Var{\hth} @@ -397,19 +398,19 @@ $$\SE{\hth} \eqdef \SD{\hth}$$ ::: -"Standard error" is a confusing concept in a few ways. First of all, it isn't even defined as a characteristic of the [estimation error](#def-estimation-error), $\eps(\hth)$! Moreover, it is just a synonym for standard deviation, so it seems like a redundant concept. However, standard errors help us construct p-values and confidence intervals, so they come up a lot - often enough to give them their own name. +"Standard error" is a confusing concept in a few ways. First of all, it isn't even defined as a characteristic of the [estimation error](#def-estimation-error), $\erf{\hth}$! Moreover, it is just a synonym for standard deviation, so it seems like a redundant concept. However, standard errors help us construct p-values and confidence intervals, so they come up a lot - often enough to give them their own name. We can relate standard error to actual error, by rearranging the result from @thm-mse-bias-variance: $$ \ba \Var{\hth} &= \Var{\hth - \th}\\ -&= \Var{\eps(\hth)}\\ +&= \Var{\erf{\hth}}\\ \ea $$ So the variance of the estimator is equal to the variance of the error, and the standard error is equal to the standard deviation of the error: -$$\SE{\hth} = \SD{\eps(\hth)}$$ +$$\SE{\hth} = \SD{\erf{\hth}}$$ --- diff --git a/probability.qmd b/probability.qmd index 819634cc2..15e895391 100644 --- a/probability.qmd +++ b/probability.qmd @@ -671,11 +671,11 @@ an **error** (or **noise term**) is a deviation of an observation from its model-implied population mean. For an observation $y$: -$$\err(y) \eqdef y - \E{Y}$$ +$$\devn(y) \eqdef y - \E{Y}$$ In regression settings, the reference mean is often conditional on covariates: -$\err_i \eqdef y_i - \E{Y_i \mid X_i}$. +$\devn(y_i) \eqdef y_i - \E{Y_i \mid X_i}$. The terms "error" and "noise" are common, but terminology is not fully consistent across sources. @@ -684,6 +684,10 @@ we use "residual" (defined in the [Estimation chapter](estimation.qmd#def-residual)) for deviations from fitted values, not for deviations from population means. +For notation in this course, +we use $\devn(\cdot)$ for these model/data deviations, +and reserve $\erf{\cdot}$ for estimator-to-estimand deviations +(see [Estimation](estimation.qmd#def-estimation-error)). See: From 17ea9a8d656fdf43f1fcf040ea425d1a4f7302cb Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 16 Apr 2026 06:21:44 +0000 Subject: [PATCH 06/76] Define statistical signal and refine residual wording in linear-model diagnostics Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/68d0b09d-31b1-48fd-b6d7-605b356270c3 Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../Linear-models-overview/_def-residual-deviation.qmd | 10 ++++++++++ .../Linear-models-overview/_sec_linreg_diagnostics.qmd | 9 ++++----- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/_subfiles/Linear-models-overview/_def-residual-deviation.qmd b/_subfiles/Linear-models-overview/_def-residual-deviation.qmd index 791550383..2369e5147 100644 --- a/_subfiles/Linear-models-overview/_def-residual-deviation.qmd +++ b/_subfiles/Linear-models-overview/_def-residual-deviation.qmd @@ -29,3 +29,13 @@ to view $y$ as the sum of its mean plus the model error/noise: $$y = \Expf{Y} + \devn(y)$$ + +:::{#def-signal} +#### Signal (statistical sense) + +In statistical modeling, +the **signal** is the deterministic part of the model. +For mean-based models, +the signal is the model-implied mean function, +for example $\E{Y \mid X}$ (or $\E{Y}$ when there are no covariates). +::: diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd index 4b7eb0106..fccb703da 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd @@ -195,13 +195,12 @@ Left to the reader. :::{#def-resid-fitted} #### Residuals of a fitted model value -The **residual of a fitted value $\hat y$** -(shorthand: "residual") -is its deviation from the observed data: +A **residual** is the deviation +between an observed value +and its corresponding fitted value, $\hat y$: $$ \ba -e(\hat y) &\eqdef \erf{\hat y} -\\&= y - \hat y +e(\hat y) &\eqdef y - \hat y \ea $$ ::: From 2b246e597e23667230c5201bbf4b037772c98b87 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 16 Apr 2026 07:56:45 +0000 Subject: [PATCH 07/76] Address review thread on error terminology and residual notation consistency Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/7d49d900-f92e-467c-8087-8b9ed97603dc Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_diagnostics.qmd | 10 +++++----- estimation.qmd | 13 +++++++++++-- 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd index fccb703da..d9ac8c9d4 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd @@ -183,7 +183,7 @@ to make it easier to inspect. #### Residuals in Gaussian models If $Y$ has a Gaussian distribution, -then $\err(Y)$ also has a Gaussian distribution, and vice versa. +then $\devn(Y)$ also has a Gaussian distribution, and vice versa. ::: :::{.proof} @@ -256,8 +256,8 @@ $$ If $\hExpf{Y}$ is an [unbiased](estimation.qmd#sec-unbiased-estimators) estimator of the mean $\Expf{Y}$, then: -$$\E{e(y)} = 0$$ {#eq-mean-resid-unbiased} -$$\Var{e(y)} \approx \ss$$ {#eq-var-resid-unbiased} +$$\E{e(\hat y)} = 0$$ {#eq-mean-resid-unbiased} +$$\Var{e(\hat y)} \approx \ss$$ {#eq-var-resid-unbiased} ::: @@ -270,7 +270,7 @@ $$\Var{e(y)} \approx \ss$$ {#eq-var-resid-unbiased} $$ \ba -\Ef{e(y)} &= \Ef{y - \hat y} +\Ef{e(\hat y)} &= \Ef{y - \hat y} \\ &= \Ef{y} - \Ef{\hat y} \\ &= \Ef{y} - \Ef{y} \\ &= 0 @@ -281,7 +281,7 @@ $$ $$ \ba -\Var{e(y)} &= \Var{y - \hy} +\Var{e(\hat y)} &= \Var{y - \hy} \\ &= \Var{y} + \Var{\hy} - 2 \Cov{y, \hy} \\ &{\dot{\approx}} \Var{y} + 0 - 2 \cdot 0 \\ &= \Var{y} diff --git a/estimation.qmd b/estimation.qmd index 5bf2a3f74..8b6c465f5 100644 --- a/estimation.qmd +++ b/estimation.qmd @@ -211,7 +211,7 @@ The fitted value $\hat y_i$ is often a sample mean or fitted conditional mean, but not always. ::: -## Relationship between residuals and errors +## Relationship between residuals, model deviations, and estimation error Let $\mu_i \eqdef \E{Y_i \mid X_i}$ denote the model-implied mean. Then: @@ -398,7 +398,16 @@ $$\SE{\hth} \eqdef \SD{\hth}$$ ::: -"Standard error" is a confusing concept in a few ways. First of all, it isn't even defined as a characteristic of the [estimation error](#def-estimation-error), $\erf{\hth}$! Moreover, it is just a synonym for standard deviation, so it seems like a redundant concept. However, standard errors help us construct p-values and confidence intervals, so they come up a lot - often enough to give them their own name. +"Standard error" is a confusing concept in a few ways. +First of all, +it isn't even defined as a characteristic of the [estimation error](#def-estimation-error), +$\erf{\hth}$! +Moreover, +it is just a synonym for standard deviation, +so it seems like a redundant concept. +However, +standard errors help us construct p-values and confidence intervals, +so they come up a lot - often enough to give them their own name. We can relate standard error to actual error, by rearranging the result from @thm-mse-bias-variance: From e239721cbff2aa37097c08d5865fb3e27030a7f1 Mon Sep 17 00:00:00 2001 From: Douglas Ezra Morrison Date: Thu, 16 Apr 2026 01:13:34 -0700 Subject: [PATCH 08/76] Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- _subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd index d9ac8c9d4..16456bc39 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd @@ -200,7 +200,7 @@ between an observed value and its corresponding fitted value, $\hat y$: $$ \ba -e(\hat y) &\eqdef y - \hat y +\devn(\hat y) &\eqdef y - \hat y \ea $$ ::: From 2f3c412145777ebdd2bd95202f4c52049b380de2 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Thu, 16 Apr 2026 09:32:07 +0000 Subject: [PATCH 09/76] Use resid notation and consolidate residual definitions into shared subfile Agent-Logs-Url: https://github.com/d-morrison/rme/sessions/012f4b05-70d4-4920-8004-45fc1778381b Co-authored-by: d-morrison <2474437+d-morrison@users.noreply.github.com> --- .../_sec_linreg_diagnostics.qmd | 31 +++++++------------ .../_def-residual-observed-minus-fitted.qmd | 18 +++++++++++ estimation.qmd | 11 ++----- local-macros.qmd | 1 + shared-config.qmd | 1 + 5 files changed, 35 insertions(+), 27 deletions(-) create mode 100644 _subfiles/shared/_def-residual-observed-minus-fitted.qmd create mode 100644 local-macros.qmd diff --git a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd index 86cbdaa9f..f1f61522f 100644 --- a/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd +++ b/_subfiles/Linear-models-overview/_sec_linreg_diagnostics.qmd @@ -203,14 +203,7 @@ Left to the reader. :::{#def-resid-fitted} #### Residuals of a fitted model value -A **residual** is the deviation -between an observed value -and its corresponding fitted value, $\hat y$: -$$ -\ba -\devn(\hat y) &\eqdef y - \hat y -\ea -$$ +{{< include _subfiles/shared/_def-residual-observed-minus-fitted.qmd >}} ::: --- @@ -247,11 +240,11 @@ Fitted values and residuals for interaction model for `birthweight` data #### Residuals of fitted values vs residual noise -$e(\hat y)$ can be seen as the maximum likelihood estimate of the residual noise: +$\resid(\hat y)$ can be seen as the maximum likelihood estimate of the residual noise: $$ \ba -e(\hy) &= y - \hat y +\resid(\hy) &= y - \hat y \\ &= \hat\eps_{ML} \ea $$ @@ -264,8 +257,8 @@ $$ If $\hExpf{Y}$ is an [unbiased](estimation.qmd#sec-unbiased-estimators) estimator of the mean $\Expf{Y}$, then: -$$\E{e(\hat y)} = 0$$ {#eq-mean-resid-unbiased} -$$\Var{e(\hat y)} \approx \ss$$ {#eq-var-resid-unbiased} +$$\E{\resid(\hat y)} = 0$$ {#eq-mean-resid-unbiased} +$$\Var{\resid(\hat y)} \approx \ss$$ {#eq-var-resid-unbiased} ::: @@ -278,7 +271,7 @@ $$\Var{e(\hat y)} \approx \ss$$ {#eq-var-resid-unbiased} $$ \ba -\Ef{e(\hat y)} &= \Ef{y - \hat y} +\Ef{\resid(\hat y)} &= \Ef{y - \hat y} \\ &= \Ef{y} - \Ef{\hat y} \\ &= \Ef{y} - \Ef{y} \\ &= 0 @@ -289,7 +282,7 @@ $$ $$ \ba -\Var{e(\hat y)} &= \Var{y - \hy} +\Var{\resid(\hat y)} &= \Var{y - \hy} \\ &= \Var{y} + \Var{\hy} - 2 \Cov{y, \hy} \\ &{\dot{\approx}} \Var{y} + 0 - 2 \cdot 0 \\ &= \Var{y} @@ -309,7 +302,7 @@ which we can estimate using $\hat\sigma^2$; that is: $$ -e_i \siid N(0, \hat\sigma^2) +\resid_i \siid N(0, \hat\sigma^2) $$ --- @@ -555,7 +548,7 @@ Residuals of interaction model for `hers` data, with and without intercept term #### Standardized residuals -$$r_i = \frac{e_i}{\widehat{SD}(e_i)}$$ +$$r_i = \frac{\resid_i}{\widehat{SD}(\resid_i)}$$ ::: @@ -567,7 +560,7 @@ $$ ### Marginal distributions of residuals -To look for problems with our model, we can check whether the residuals $e_i$ and standardized residuals $r_i$ look like they have the distributions that they are supposed to have, according to the model. +To look for problems with our model, we can check whether the residuals $\resid_i$ and standardized residuals $r_i$ look like they have the distributions that they are supposed to have, according to the model. --- @@ -914,7 +907,7 @@ All three plots show the same data and reference line. ### Conditional distributions of residuals -If our Gaussian linear regression model is correct, the residuals $e_i$ and standardized residuals $r_i$ should have: +If our Gaussian linear regression model is correct, the residuals $\resid_i$ and standardized residuals $r_i$ should have: - an approximately Gaussian distribution, with: - a mean of 0 @@ -936,7 +929,7 @@ Regardless of whether we guessed the mean function correctly, ther the variance #### Residuals versus fitted values ::: notes -To look for these issues, we can plot the residuals $e_i$ against the fitted values $\hat y_i$ (@fig-bw_lm2-resid-vs-fitted). +To look for these issues, we can plot the residuals $\resid_i$ against the fitted values $\hat y_i$ (@fig-bw_lm2-resid-vs-fitted). ::: :::{#fig-bw_lm2-resid-vs-fitted} diff --git a/_subfiles/shared/_def-residual-observed-minus-fitted.qmd b/_subfiles/shared/_def-residual-observed-minus-fitted.qmd new file mode 100644 index 000000000..e3fbb9a19 --- /dev/null +++ b/_subfiles/shared/_def-residual-observed-minus-fitted.qmd @@ -0,0 +1,18 @@ +A **residual** is the deviation +between an observed value +and its corresponding fitted value, $\hat y$: +$$ +\ba +\resid(\hat y) &\eqdef y - \hat y +\ea +$$ + +For indexed observations, +this is equivalently: +$$ +\resid_i \eqdef y_i - \hat y_i +$$ + +The fitted value $\hat y_i$ is often +a sample mean or fitted conditional mean, +but not always. diff --git a/estimation.qmd b/estimation.qmd index 8b6c465f5..a9c8537e4 100644 --- a/estimation.qmd +++ b/estimation.qmd @@ -11,6 +11,7 @@ format: # Probabilistic models {.scrollable} {{< include latex-macros/macros.qmd >}} +{{< include local-macros.qmd >}} --- @@ -202,13 +203,7 @@ $$\erf{\hth} \eqdef \hth - \th$$ ::: {#def-residual} #### Residual -A **residual** is the difference between an observed value -and its fitted value: - -$$e_i \eqdef y_i - \hat y_i$$ - -The fitted value $\hat y_i$ is often a sample mean or fitted conditional mean, -but not always. +{{< include _subfiles/shared/_def-residual-observed-minus-fitted.qmd >}} ::: ## Relationship between residuals, model deviations, and estimation error @@ -218,7 +213,7 @@ Then: $$ \ba -e_i +\resid_i &= y_i - \hat y_i\\ &= \paren{y_i - \mu_i} - \paren{\hat y_i - \mu_i} \ea diff --git a/local-macros.qmd b/local-macros.qmd new file mode 100644 index 000000000..27fbb73cc --- /dev/null +++ b/local-macros.qmd @@ -0,0 +1 @@ +\providecommand{\resid}{e} diff --git a/shared-config.qmd b/shared-config.qmd index e2153a87d..63487f730 100644 --- a/shared-config.qmd +++ b/shared-config.qmd @@ -1,5 +1,6 @@ {{< include r-config.qmd >}} {{< include latex-macros/macros.qmd >}} +{{< include local-macros.qmd >}} ```{=html}