Repeated Measure Linear Models

2025

Recap: Simple Linear Regression

We assume linear relationship between $X$ and $Y$.
We assume independent, normally distributed residuals with constant variance.

\[ Y_i = \beta_0 + \beta_1 X_i + \epsilon_i \]

But what if each subject contributes multiple observations?
Then residuals are no longer independent.

Matrix Form of the Linear Model

A linear model can be written in matrix form:

\[ \mathbf{y} = \mathbf{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \]

Where:

$\mathbf{y}$ is an $n \times 1$ vector of outcomes
$\mathbf{X}$ is an $n \times p$ design matrix (predictors)
$\boldsymbol{\beta}$ is a $p \times 1$ vector of coefficients
$\boldsymbol{\epsilon}$ is an $n \times 1$ vector of residuals

Residuals as Multivariate Normal

In simple regression we say:

\[ \epsilon_i \sim \mathscr{N}(0, \sigma^2) \]

In matrix form, this means:

\[ \boldsymbol{\epsilon} \sim \mathscr{N}(\mathbf{0}, \sigma^2 \mathbf{I}) \]

Example (for $n = 3$ observations):

\[ \boldsymbol{\epsilon} = \begin{bmatrix} \epsilon_1 \\ \epsilon_2 \\ \epsilon_3 \end{bmatrix} \sim \mathscr{N}\left( \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, \sigma^2 \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} \right) \]

Residuals as Multivariate Normal

But in repeated measures regression, residuals are not independent:

\[ \boldsymbol{\epsilon} \sim \mathscr{N}(\mathbf{0}, \boldsymbol{\Sigma}) \]

Example of correlated residuals for one subject assuming compound symmetry:

\[ \boldsymbol{\epsilon} \sim \mathscr{N}\left( \begin{bmatrix} 0 \\ 0 \\ 0 \end{bmatrix}, \sigma^2 \begin{bmatrix} 1 & \rho & \rho \\ \rho & 1 & \rho \\ \rho & \rho & 1 \end{bmatrix} \right) \]

Where $\boldsymbol{\Sigma}$ reflects the residual covariance structure.

What Does Sampling from This Distribution Mean?

Sampling from $\mathscr{N}(\mathbf{0}, \boldsymbol{\Sigma})$ returns a vector of length $n$
Each sample is a plausible full set of residuals across all observations
When residuals are correlated, you can’t sample each $\epsilon\_i$ independently — you must sample the whole vector to preserve structure

Problem with Simple Regression + Repeated Measures

Say you measure each participant across several trials (e.g., learning curves)
Within-subject responses are likely correlated
Violates independence assumption:

\[ \text{Cov}(\epsilon_{ij}, \epsilon_{ik}) \neq 0 \quad \text{for same subject $i$} \]

Using lm() will give misleading standard errors and potentially incorrect $p$-values.

Solution: Model the Residual Structure

We need a model that accounts for dependency within subjects
Option 1: Generalized Least Squares (GLS)
Option 2: Linear Mixed Models (LMM)
We will focus on GLS

Generalized Least Squares (GLS)

Allows us to specify a residual covariance structure
Implemented in R via the nlme::gls() function
Common structure: compound symmetry

What is Compound Symmetry?

Assumes all observations from the same subject are equally correlated
Each subject has:
- Constant variance on the diagonal: $\text{Var}(\epsilon_{ij}) = \sigma^2$
- Constant covariance off-diagonal: $\text{Cov}(\epsilon_{ij}, \epsilon_{ik}) = \rho \sigma^2$

Compound Symmetry Example (3 time points):

\[ \text{Var}(\boldsymbol{\epsilon}) = \begin{bmatrix} \sigma^2 & \rho\sigma^2 & \rho\sigma^2 \\ \rho\sigma^2 & \sigma^2 & \rho\sigma^2 \\ \rho\sigma^2 & \rho\sigma^2 & \sigma^2 \end{bmatrix} \]

This structure is what repeated measures ANOVA implicitly assumes
It’s appropriate when time / order doesn’t matter, just “same person” correlation
Intuitively, it’s like saying: good participants tend to be good throughout — the model expects consistent performance across time, but not necessarily improvement or decline

Example Model in R

library(nlme)

gls_model <- gls(
  y ~ x,
  correlation = corCompSymm(form = ~ 1 | subject),
  data = your_data_here # replace with your dataset
)
summary(gls_model)

corCompSymm: compound symmetry structure (like repeated measures ANOVA)
~ 1 | subject: tells gls() that observations are grouped by subject

Comparing to Repeated Measures ANOVA

A one-way repeated measures ANOVA is a special case of GLS
1. a categorical predictor and (2) Compound symmetric residuals
GLS generalizes this to continuous predictors (regression)

Visual Comparison

With lm(): assumes flat residual structure
With gls(): allows within-subject correlation

\[ \text{Var}(\epsilon) = \sigma^2 I \quad \text{vs.} \quad \sigma^2 (I + \rho J) \]

Where $J$ is a matrix of ones, $\rho$ is correlation

Implication for Estimation

When $\text{Var}(\boldsymbol{\epsilon}) = \sigma^2 \mathbf{I}$, ordinary least squares (OLS) is optimal
When $\text{Var}(\boldsymbol{\epsilon}) = \boldsymbol{\Sigma} \neq \sigma^2 \mathbf{I}$, GLS is needed
GLS uses a weight matrix $\boldsymbol{\Sigma}^{-1}$ to account for correlated residuals
Like OLS, GLS finds $\widehat{\boldsymbol{\beta}}$ by minimizing a sum of squared residuals
But it uses a weighted version that accounts for correlated errors