The General Linear Model (GLM) and One-Way ANOVA

2025

The GLM

The Generalized Linear Model (GLM) is an extension of the traditional linear regression model that allows for:

Multiple predictor variables.
Predictor variables that can be either continuous or categorical.
The response variable \(y\) to follow different distributions from the exponential family (e.g., Normal, Binomial, Poisson). We won’t cover this here.
A link function that relates the mean of the response variable to the linear predictors. We won’t cover this here either.

Matrix form of the GLM

\[ \begin{align} \boldsymbol{y} &= \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \\ \end{align} \]

Simple Linear Regression as a GLM

\[ \begin{align} \boldsymbol{y} = \begin{bmatrix} 1 & x_{1} \\ 1 & x_{2} \\ \vdots & \vdots \\ 1 & x_{n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

Multiple Regression with two predictors as a GLM

\[ \begin{align} \boldsymbol{y} = \begin{bmatrix} 1 & x_{1_1} & x_{2_1} \\ 1 & x_{1_2} & x_{2_2} \\ \vdots & \vdots & \vdots \\ 1 & x_{1_n} & x_{2_n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \beta_{2} \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

Multiple Regression with three predictors as a GLM

\[ \begin{align} \boldsymbol{y} = \begin{bmatrix} 1 & x_{1_1} & x_{2_1} & x_{3_1} \\ 1 & x_{1_2} & x_{2_2} & x_{3_2} \\ \vdots & \vdots & \vdots & \vdots \\ 1 & x_{1_n} & x_{2_n} & x_{3_n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

One-way ANOVA with three levels as a GLM

\[ Y_{ij} = \mu + \alpha_i + \epsilon_{ij}, \]

\(Y_{ij}\) is the \(j\)-th observation in the \(i\)-th level of Factor A.
\(\mu\) is the overall mean.
\(\alpha_i\) is the effect of the \(i\)-th level of Factor A (\(i = 1, 2, 3\)).
\(\epsilon_{ij}\) is the \(j\)-th residual.

Matrix form of one-way ANOVA with three levels

\[ \scriptstyle \begin{align} \boldsymbol{y} = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} \begin{bmatrix} \mu \\ \alpha_1 \\ \alpha_2 \\ \alpha_3 \\ \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{13} \\ \vdots \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{23} \\ \vdots \\ \epsilon_{31} \\ \epsilon_{32} \\ \epsilon_{33} \\ \vdots \\ \end{bmatrix} \end{align} \]

Problem with the previous model formulation

This model includes an intercept \(\mu\), and group effects \(\alpha_1, \alpha_2, \alpha_3\)
But the columns of \(\boldsymbol{X}\) are linearly dependent:

\[ \text{Col}_1 = \text{Col}_2 + \text{Col}_3 + \text{Col}_4 \]

Multiple combinations of \(\mu\) and \(\alpha_i\) yield the same predictions and unique values of these parameters can therefore not be estimated.
The model is not identifiable.

Ways to fix the problem

Option 1. Remove one \(\alpha_i\) (reference coding)
Option 2: Impose the constraint \(\alpha_1 + \alpha_2 + \alpha_3 = 0\) (sum-to-zero)

Reference Coding for One-Way ANOVA

Set \(\alpha_1 = 0\) to serve as baseline

\[ \scriptstyle \begin{align} \boldsymbol{y} = \begin{bmatrix} 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ \vdots & \vdots & \vdots \\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ \vdots & \vdots & \vdots \\ \end{bmatrix} \begin{bmatrix} \mu \\ \alpha_2 \\ \alpha_3 \\ \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{13} \\ \vdots \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{23} \\ \vdots \\ \epsilon_{31} \\ \epsilon_{32} \\ \epsilon_{33} \\ \vdots \end{bmatrix} \end{align} \]

Reference coding (default in R)

x <- factor(c("A", "B", "C"))

# The contrasts() function returns how R is currently
# encoding the factor contrasts(x)

contrasts(x)

# Output:
#     B   C
# A   0   0
# B   1   0
# C   0   1

# Even if `x` has multiple observations, `contrasts(x)` returns a matrix
# with one row per *level* of the factor, not per observation.
# The actual design matrix used in the regression will expand these contrast
# values to one row per observation automatically.
# This is handled internally by the model-fitting function (e.g., lm()).

Reference coding (default in R)

Reference coding by default sets group A as the baseline
The model estimates:
- Intercept = mean of group A
- Coefficient for B = mean(B) − mean(A)
- Coefficient for C = mean(C) − mean(A)

Reference coding: Example

library(data.table)
library(ggplot2)

set.seed(42) # Set seed for reproducibility

# Simulate data suitable for one-way ANOVA
dt <- data.table(
  group = factor(rep(c("A", "B", "C"), each = 20)),
  y = c(rnorm(20, mean = 5, sd = 1),  # Group A
        rnorm(20, mean = 6, sd = 1),  # Group B
        rnorm(20, mean = 8, sd = 1)   # Group C
  )
)

lm_ref <- lm(y ~ group, data = dt) # fit model
summary(lm_ref) # summary of the model
anova(lm_ref) # ANOVA table

Reference coding: Example

## 
## Call:
## lm(formula = y ~ group, data = dt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.99218 -0.48550  0.09491  0.72760  2.16619 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.1919     0.2583  20.104  < 2e-16 ***
## groupB        0.5371     0.3652   1.471    0.147    
## groupC        2.8072     0.3652   7.686 2.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.155 on 57 degrees of freedom
## Multiple R-squared:  0.5388, Adjusted R-squared:  0.5226 
## F-statistic: 33.29 on 2 and 57 DF,  p-value: 2.643e-10

Reference coding: Example

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## group      2 88.813  44.406  33.289 2.643e-10 ***
## Residuals 57 76.035   1.334                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sum-to-Zero Constraint for One-Way ANOVA

Model includes all three \(\alpha_i\), but impose a constraint:

\[ \alpha_1 + \alpha_2 + \alpha_3 = 0 \]

Sum-to-Zero Constraint for One-Way ANOVA

\[ \scriptstyle \boldsymbol{y} = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ \vdots & \vdots & \vdots & \vdots \end{bmatrix} \begin{bmatrix} \mu \\ \alpha_1 \\ \alpha_2 \\ \alpha_3 \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{13} \\ \vdots \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{23} \\ \vdots \\ \epsilon_{31} \\ \epsilon_{32} \\ \epsilon_{33} \\ \vdots \end{bmatrix} \]

Sum-to-Zero Constraint for One-Way ANOVA

\(\mu\) is the grand mean across all groups.
Each \(\alpha_i\) is the deviation of group \(i\) from the grand mean.
\(\boldsymbol{X}\) still has linearly dependent columns, but the constraint \(\alpha_1 + \alpha_2 + \alpha_3 = 0\) removes the redundancy in parameter estimation.
The model is identifiable.

Sum-to-zero coding

# To use sum-to-zero coding, you must explicitly assign new contrasts
contrasts(x) <- contr.sum(3)

# Output of contr.sum(3):
#     [,1] [,2]
# 1     1    0
# 2     0    1
# 3    -1   -1

# The rows in the contrast matrix are ordered according to `levels(x)` 
# In this case, Row 1 = group A; Row 2 = group B; Row 3 = group C
#
# One column compares group A to the grand mean
# One column compares group B to the grand mean
#
# R drops one column because it's using a linear constraint
# (i.e., the columns sum to zero) to encode the missing
# parameter.

Sum-to-zero coding

\[ \scriptstyle \boldsymbol{y} = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ \vdots & \vdots & \vdots \\ 1 & -1 & -1 \\ 1 & -1 & -1 \\ 1 & -1 & -1 \\ \vdots & \vdots & \vdots \end{bmatrix} \begin{bmatrix} \mu \\ \alpha_1 \\ \alpha_2 \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{13} \\ \vdots \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{23} \\ \vdots \\ \epsilon_{31} \\ \epsilon_{32} \\ \epsilon_{33} \\ \vdots \end{bmatrix} \]

Sum-to-zero coding: Example

library(data.table)
library(ggplot2)

set.seed(42) # Set seed for reproducibility

# Simulate data suitable for one-way ANOVA
dt <- data.table(
  group = factor(rep(c("A", "B", "C"), each = 20)),
  y = c(rnorm(20, mean = 5, sd = 1),  # Group A
        rnorm(20, mean = 6, sd = 1),  # Group B
        rnorm(20, mean = 8, sd = 1)   # Group C
  )
)

lm_stz <- lm(y ~ group, data = dt) # fit model
summary(lm_stz) # summary of the model
anova(lm_stz) # ANOVA table

Sum-to-zero coding: Example

## 
## Call:
## lm(formula = y ~ group, data = dt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.99218 -0.48550  0.09491  0.72760  2.16619 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.1919     0.2583  20.104  < 2e-16 ***
## groupB        0.5371     0.3652   1.471    0.147    
## groupC        2.8072     0.3652   7.686 2.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.155 on 57 degrees of freedom
## Multiple R-squared:  0.5388, Adjusted R-squared:  0.5226 
## F-statistic: 33.29 on 2 and 57 DF,  p-value: 2.643e-10

Sum-to-zero coding: Example

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## group      2 88.813  44.406  33.289 2.643e-10 ***
## Residuals 57 76.035   1.334                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: What about types of sums of squares?

R uses Type I sums of squares by default, which is sequential and depends on the order of factors in the model.
Standard practice in psychology is to use Type III sums of squares, which tests each factor after accounting for all other factors in the model.
Base R avoids Type III ANOVA by design — not oversight.
The philosophy is: Build models that reflect your design and interpret coefficients directly, rather than forcing omnibus tests through orthogonal decompositions.
My only goal here is to show how one-way ANOVA can be expressed as a GLM, not dive into statistics philosophy.
Later, we will see how everything we’ve learned so far can also be seen as a GLM.