2025

The GLM

The Generalized Linear Model (GLM) is an extension of the traditional linear regression model that allows for:

  • Multiple predictor variables.

  • Predictor variables that can be either continuous or categorical.

  • The response variable \(y\) to follow different distributions from the exponential family (e.g., Normal, Binomial, Poisson). We won’t cover this here.

  • A link function that relates the mean of the response variable to the linear predictors. We won’t cover this here either.

Matrix form of the GLM

\[ \begin{align} \boldsymbol{y} &= \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \\ \end{align} \]

Simple Linear Regression as a GLM

\[ \begin{align} \boldsymbol{y} = \begin{bmatrix} 1 & x_{1} \\ 1 & x_{2} \\ \vdots & \vdots \\ 1 & x_{n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

Multiple Regression with two predictors as a GLM

\[ \begin{align} \boldsymbol{y} = \begin{bmatrix} 1 & x_{1_1} & x_{2_1} \\ 1 & x_{1_2} & x_{2_2} \\ \vdots & \vdots & \vdots \\ 1 & x_{1_n} & x_{2_n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \beta_{2} \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

Multiple Regression with three predictors as a GLM

\[ \begin{align} \boldsymbol{y} = \begin{bmatrix} 1 & x_{1_1} & x_{2_1} & x_{3_1} \\ 1 & x_{1_2} & x_{2_2} & x_{3_2} \\ \vdots & \vdots & \vdots & \vdots \\ 1 & x_{1_n} & x_{2_n} & x_{3_n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \beta_{2} \\ \beta_{3} \\ \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

One-way ANOVA with three levels as a GLM

\[ Y_{ij} = \mu + \alpha_i + \epsilon_{ij}, \]

  • \(Y_{ij}\) is the \(j\)-th observation in the \(i\)-th level of Factor A.

  • \(\mu\) is the overall mean.

  • \(\alpha_i\) is the effect of the \(i\)-th level of Factor A (\(i = 1, 2, 3\)).

  • \(\epsilon_{ij}\) is the \(j\)-th residual.

Matrix form of one-way ANOVA with three levels

\[ \scriptstyle \begin{align} \boldsymbol{y} = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ \vdots & \vdots & \vdots & \vdots \\ \end{bmatrix} \begin{bmatrix} \mu \\ \alpha_1 \\ \alpha_2 \\ \alpha_3 \\ \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{13} \\ \vdots \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{23} \\ \vdots \\ \epsilon_{31} \\ \epsilon_{32} \\ \epsilon_{33} \\ \vdots \\ \end{bmatrix} \end{align} \]

Problem with the previous model formulation

  • This model includes an intercept \(\mu\), and group effects \(\alpha_1, \alpha_2, \alpha_3\)

  • But the columns of \(\boldsymbol{X}\) are linearly dependent:

\[ \text{Col}_1 = \text{Col}_2 + \text{Col}_3 + \text{Col}_4 \]

  • Multiple combinations of \(\mu\) and \(\alpha_i\) yield the same predictions and unique values of these parameters can therefore not be estimated.

  • The model is not identifiable.

Ways to fix the problem

  • Option 1. Remove one \(\alpha_i\) (reference coding)

  • Option 2: Impose the constraint \(\alpha_1 + \alpha_2 + \alpha_3 = 0\) (sum-to-zero)

Reference Coding for One-Way ANOVA

  • Set \(\alpha_1 = 0\) to serve as baseline

\[ \scriptstyle \begin{align} \boldsymbol{y} = \begin{bmatrix} 1 & 0 & 0 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \\ \vdots & \vdots & \vdots \\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ \vdots & \vdots & \vdots \\ \end{bmatrix} \begin{bmatrix} \mu \\ \alpha_2 \\ \alpha_3 \\ \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{13} \\ \vdots \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{23} \\ \vdots \\ \epsilon_{31} \\ \epsilon_{32} \\ \epsilon_{33} \\ \vdots \end{bmatrix} \end{align} \]

Reference coding (default in R)

x <- factor(c("A", "B", "C"))

# The contrasts() function returns how R is currently
# encoding the factor contrasts(x)

contrasts(x)

# Output:
#     B   C
# A   0   0
# B   1   0
# C   0   1

# Even if `x` has multiple observations, `contrasts(x)` returns a matrix
# with one row per *level* of the factor, not per observation.
# The actual design matrix used in the regression will expand these contrast
# values to one row per observation automatically.
# This is handled internally by the model-fitting function (e.g., lm()).

Reference coding (default in R)

  • Reference coding by default sets group A as the baseline

  • The model estimates:

    • Intercept = mean of group A

    • Coefficient for B = mean(B) − mean(A)

    • Coefficient for C = mean(C) − mean(A)

Reference coding: Example

library(data.table)
library(ggplot2)

set.seed(42) # Set seed for reproducibility

# Simulate data suitable for one-way ANOVA
dt <- data.table(
  group = factor(rep(c("A", "B", "C"), each = 20)),
  y = c(rnorm(20, mean = 5, sd = 1),  # Group A
        rnorm(20, mean = 6, sd = 1),  # Group B
        rnorm(20, mean = 8, sd = 1)   # Group C
  )
)

lm_ref <- lm(y ~ group, data = dt) # fit model
summary(lm_ref) # summary of the model
anova(lm_ref) # ANOVA table

Reference coding: Example

## 
## Call:
## lm(formula = y ~ group, data = dt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.99218 -0.48550  0.09491  0.72760  2.16619 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.1919     0.2583  20.104  < 2e-16 ***
## groupB        0.5371     0.3652   1.471    0.147    
## groupC        2.8072     0.3652   7.686 2.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.155 on 57 degrees of freedom
## Multiple R-squared:  0.5388, Adjusted R-squared:  0.5226 
## F-statistic: 33.29 on 2 and 57 DF,  p-value: 2.643e-10

Reference coding: Example

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## group      2 88.813  44.406  33.289 2.643e-10 ***
## Residuals 57 76.035   1.334                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Sum-to-Zero Constraint for One-Way ANOVA

  • Model includes all three \(\alpha_i\), but impose a constraint:

\[ \alpha_1 + \alpha_2 + \alpha_3 = 0 \]

Sum-to-Zero Constraint for One-Way ANOVA

\[ \scriptstyle \boldsymbol{y} = \begin{bmatrix} 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ 1 & 1 & 0 & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ 1 & 0 & 1 & 0 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ 1 & 0 & 0 & 1 \\ \vdots & \vdots & \vdots & \vdots \end{bmatrix} \begin{bmatrix} \mu \\ \alpha_1 \\ \alpha_2 \\ \alpha_3 \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{13} \\ \vdots \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{23} \\ \vdots \\ \epsilon_{31} \\ \epsilon_{32} \\ \epsilon_{33} \\ \vdots \end{bmatrix} \]

Sum-to-Zero Constraint for One-Way ANOVA

  • \(\mu\) is the grand mean across all groups.

  • Each \(\alpha_i\) is the deviation of group \(i\) from the grand mean.

  • \(\boldsymbol{X}\) still has linearly dependent columns, but the constraint \(\alpha_1 + \alpha_2 + \alpha_3 = 0\) removes the redundancy in parameter estimation.

  • The model is identifiable.

Sum-to-zero coding

# To use sum-to-zero coding, you must explicitly assign new contrasts
contrasts(x) <- contr.sum(3)

# Output of contr.sum(3):
#     [,1] [,2]
# 1     1    0
# 2     0    1
# 3    -1   -1

# The rows in the contrast matrix are ordered according to `levels(x)` 
# In this case, Row 1 = group A; Row 2 = group B; Row 3 = group C
#
# One column compares group A to the grand mean
# One column compares group B to the grand mean
#
# R drops one column because it's using a linear constraint
# (i.e., the columns sum to zero) to encode the missing
# parameter.

Sum-to-zero coding

\[ \scriptstyle \boldsymbol{y} = \begin{bmatrix} 1 & 1 & 0 \\ 1 & 1 & 0 \\ 1 & 1 & 0 \\ \vdots & \vdots & \vdots \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ 1 & 0 & 1 \\ \vdots & \vdots & \vdots \\ 1 & -1 & -1 \\ 1 & -1 & -1 \\ 1 & -1 & -1 \\ \vdots & \vdots & \vdots \end{bmatrix} \begin{bmatrix} \mu \\ \alpha_1 \\ \alpha_2 \end{bmatrix} + \begin{bmatrix} \epsilon_{11} \\ \epsilon_{12} \\ \epsilon_{13} \\ \vdots \\ \epsilon_{21} \\ \epsilon_{22} \\ \epsilon_{23} \\ \vdots \\ \epsilon_{31} \\ \epsilon_{32} \\ \epsilon_{33} \\ \vdots \end{bmatrix} \]

Sum-to-zero coding: Example

library(data.table)
library(ggplot2)

set.seed(42) # Set seed for reproducibility

# Simulate data suitable for one-way ANOVA
dt <- data.table(
  group = factor(rep(c("A", "B", "C"), each = 20)),
  y = c(rnorm(20, mean = 5, sd = 1),  # Group A
        rnorm(20, mean = 6, sd = 1),  # Group B
        rnorm(20, mean = 8, sd = 1)   # Group C
  )
)

lm_stz <- lm(y ~ group, data = dt) # fit model
summary(lm_stz) # summary of the model
anova(lm_stz) # ANOVA table

Sum-to-zero coding: Example

## 
## Call:
## lm(formula = y ~ group, data = dt)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.99218 -0.48550  0.09491  0.72760  2.16619 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.1919     0.2583  20.104  < 2e-16 ***
## groupB        0.5371     0.3652   1.471    0.147    
## groupC        2.8072     0.3652   7.686 2.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.155 on 57 degrees of freedom
## Multiple R-squared:  0.5388, Adjusted R-squared:  0.5226 
## F-statistic: 33.29 on 2 and 57 DF,  p-value: 2.643e-10

Sum-to-zero coding: Example

## Analysis of Variance Table
## 
## Response: y
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## group      2 88.813  44.406  33.289 2.643e-10 ***
## Residuals 57 76.035   1.334                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

WARNING: What about types of sums of squares?

  • R uses Type I sums of squares by default, which is sequential and depends on the order of factors in the model.

  • Standard practice in psychology is to use Type III sums of squares, which tests each factor after accounting for all other factors in the model.

  • Base R avoids Type III ANOVA by design — not oversight.

  • The philosophy is: Build models that reflect your design and interpret coefficients directly, rather than forcing omnibus tests through orthogonal decompositions.

  • My only goal here is to show how one-way ANOVA can be expressed as a GLM, not dive into statistics philosophy.

  • Later, we will see how everything we’ve learned so far can also be seen as a GLM.