2025

Recap: Simple Linear Regression

  • We assume linear relationship between \(X\) and \(Y\).

  • We assume independent, normally distributed residuals with constant variance.

\[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \]

  • But what if each subject contributes multiple observations?

  • Then residuals are no longer independent.

Matrix Form of the Linear Model

  • A linear model can be written in matrix form:

\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \]

Where:

  • \(\mathbf{y}\) is an \(n \times 1\) vector of outcomes
  • \(\mathbf{X}\) is an \(n \times p\) design matrix (predictors)
  • \(\boldsymbol{\beta}\) is a \(p \times 1\) vector of coefficients
  • \(\boldsymbol{\epsilon}\) is an \(n \times 1\) vector of residuals

Residuals as Multivariate Normal

  • In simple regression we say:

\[ \epsilon_i \sim \mathscr{N}(0, \sigma^2) \]

  • In matrix form, this means:

\[ \boldsymbol{\epsilon} \sim \mathscr{N}(\mathbf{0}, \sigma^2 \mathbf{I}) \]

  • Example (for \(n = 3\) observations):

\[ \boldsymbol{\epsilon} = \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \end{bmatrix} \sim \mathscr{N}\left( \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, \sigma^2 \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \right) \]

Residuals as Multivariate Normal

  • But in repeated measures regression, residuals are not independent:

\[ \boldsymbol{\epsilon} \sim \mathscr{N}(\mathbf{0}, \boldsymbol{\Sigma}) \]

  • Example of correlated residuals for one subject assuming compound symmetry:

\[ \boldsymbol{\epsilon} \sim \mathscr{N}\left( \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, \sigma^2 \begin{bmatrix} 1 & \rho & \rho \\ \rho & 1 & \rho \\ \rho & \rho & 1 \end{bmatrix} \right) \]

  • Where \(\boldsymbol{\Sigma}\) reflects the residual covariance structure.

What Does Sampling from This Distribution Mean?

  • Sampling from \(\mathscr{N}(\mathbf{0}, \boldsymbol{\Sigma})\) returns a vector of length \(n\)

  • Each sample is a plausible full set of residuals across all observations

  • When residuals are correlated, you can’t sample each \(\epsilon\_i\) independently — you must sample the whole vector to preserve structure

Problem with Simple Regression + Repeated Measures

  • Say you measure each participant across several trials (e.g., learning curves)

  • Within-subject responses are likely correlated

  • Violates independence assumption:

\[ \text{Cov}(\epsilon_{ij}, \epsilon_{ik}) \neq 0 \quad \text{for same subject $i$} \]

  • Using lm() will give misleading standard errors and potentially incorrect \(p\)-values.

Solution: Model the Residual Structure

  • We need a model that accounts for dependency within subjects

  • Option 1: Generalized Least Squares (GLS)

  • Option 2: Linear Mixed Models (LMM)

  • We will focus on GLS

Generalized Least Squares (GLS)

  • Allows us to specify a residual covariance structure

  • Implemented in R via the nlme::gls() function

  • Common structure: compound symmetry

What is Compound Symmetry?

  • Assumes all observations from the same subject are equally correlated

  • Each subject has:

    • Constant variance on the diagonal: \(\text{Var}(\epsilon_{ij}) = \sigma^2\)

    • Constant covariance off-diagonal: \(\text{Cov}(\epsilon_{ij}, \epsilon_{ik}) = \rho \sigma^2\)

Compound Symmetry Example (3 time points):

\[ \text{Var}(\boldsymbol{\epsilon}) = \begin{bmatrix} \sigma^2 & \rho\sigma^2 & \rho\sigma^2 \\ \rho\sigma^2 & \sigma^2 & \rho\sigma^2 \\ \rho\sigma^2 & \rho\sigma^2 & \sigma^2 \end{bmatrix} \]

  • This structure is what repeated measures ANOVA implicitly assumes

  • It’s appropriate when time / order doesn’t matter, just “same person” correlation

  • Intuitively, it’s like saying: good participants tend to be good throughout — the model expects consistent performance across time, but not necessarily improvement or decline

Example Model in R

library(nlme)

gls_model <- gls(
  y ~ x,
  correlation = corCompSymm(form = ~ 1 | subject),
  data = your_data_here # replace with your dataset
)
summary(gls_model)
  • corCompSymm: compound symmetry structure (like repeated measures ANOVA)

  • ~ 1 | subject: tells gls() that observations are grouped by subject

Comparing to Repeated Measures ANOVA

  • A one-way repeated measures ANOVA is a special case of GLS
    1. a categorical predictor and (2) Compound symmetric residuals
  • GLS generalizes this to continuous predictors (regression)

Visual Comparison

  • With lm(): assumes flat residual structure

  • With gls(): allows within-subject correlation

\[ \text{Var}(\epsilon) = \sigma^2 I \quad \text{vs.} \quad \sigma^2 (I + \rho J) \]

Where \(J\) is a matrix of ones, \(\rho\) is correlation

Implication for Estimation

  • When \(\text{Var}(\boldsymbol{\epsilon}) = \sigma^2 \mathbf{I}\), ordinary least squares (OLS) is optimal

  • When \(\text{Var}(\boldsymbol{\epsilon}) = \boldsymbol{\Sigma} \neq \sigma^2 \mathbf{I}\), GLS is needed

  • GLS uses a weight matrix \(\boldsymbol{\Sigma}^{-1}\) to account for correlated residuals

  • Like OLS, GLS finds \(\widehat{\boldsymbol{\beta}}\) by minimizing a sum of squared residuals

  • But it uses a weighted version that accounts for correlated errors