26 Repeated measures ANOVA

  • A repeated measures design is one in which at least one of the factors consists of repeated measurements on the same experiment unit – this usually corresponds to multiple measurements from the same subjects.

  • It is fair to view this as an extension of the paired-samples t-test, just as it is fair to view factorial ANOVA as an extension of the independent samples t-test.

  • Advantage: individual differences possibly reduced as a source of between-group differences.

  • Advantage: sample size is not divided between conditions so can require fewer subjects.

  • Disadvantage: fewer subjects means smaller degrees of freedom (we will see below the relevant \(df\) term shrinks from \(n_{observations} - k\) to \((k - 1)(n_{subject} - 1)\). The more degrees of freedom we have, in general, the less extreme observed outcome we need to reject the null (because of its effect on the shape of the sampling distribution of our test statistic).

26.1 Intuition

  • The intuition for a repeated measures ANOVA is the same as that for a factorial ANOVA.

  • E.g., if the population means in the levels of some factor (e.g., the mean effect of different doses of a medicine) are different, then between-level variability should be greater than within-level variability.

  • However, the repeated measures aspect introduces one important difference.

  • Between-level variability will inherently be smaller in a repeated measures design than in an independent samples design (e.g., because the same subjects give measurements for each level, and subjects tend to be similar to themselves).

  • This means that, to decide that there are true differences, we should require less between-level differences in variability for repeated measures designs than for independent samples designs.

  • Recall that for a factorial ANOVA, the \(F\)-test that we use is a ratio of between-level variability to within-level variability.

    \[F = \frac{MS_{between-levels}}{MS_{within-levels}}\]

  • In a repeated measures ANOVA, the \(F\)-test that we use is the ratio

    \[F = \frac{MS_{between-levels}}{MS_{within-levels} - MS_{between-subjects}}\]

26.2 Formal treatment

  • \(k\) is the number of factor levels
  • \(n\) is the number of subjects
  • \(x_{ij}\) is observation from factor level \(i\) and subject \(j\)

\[\begin{align} SS_{between-levels} &= n \sum_{i=1}^k (\bar{x_{i \bullet}} - \bar{x_{\bullet \bullet}})^2 \\ SS_{within-levels} &= \sum_{i=1}^k \sum_{j=1}^n (x_{ij} - \bar{x_{i \bullet}})^2 \\ SS_{between-subject} &= k \sum_{j=1}^n (\bar{x_{\bullet j}} - \bar{x_{\bullet \bullet}})^2 \\ SS_{error} &= SS_{within-levels} - SS_{between-subject} \\ \end{align}\]

  • The nomenclature \(SS_{error}\) will make more sense in the coming lectures.

  • This leads to the ANOVA table:

\(Df\) \(SS\) \(MS\) \(F\) \(P(>F)\)
k-1 see above \(SS_{between-levels}\) \(\frac{MS_{between-levels}}{MS_{error}}\)
(k-1)(n-1) see above \(SS_{error}\)

26.3 Repeated measures ANOVA in R

26.3.1 toy example

##     level subject     score
##  1:     1       1 11.262954
##  2:     1       2  9.673767
##  3:     1       3 11.329799
##  4:     1       4 11.272429
##  5:     1       5 10.414641
##  6:     2       1 18.460050
##  7:     2       2 19.071433
##  8:     2       3 19.705280
##  9:     2       4 19.994233
## 10:     2       5 22.404653
## 11:     3       1 30.763593
## 12:     3       2 29.200991
## 13:     3       3 28.852343
## 14:     3       4 29.710538
## 15:     3       5 29.700785
  • Notice in the above data that each subject gives multiple measurements (one per factor level).
level <- rep(1:3, each=5)
subject <- rep(1:5, 3)
score <- c(11.262954, 9.673767, 11.329799, 11.272429, 10.414641, 
           18.460050, 19.071433, 19.705280, 19.994233, 22.404653, 
           30.763593, 29.200991, 28.852343, 29.710538, 29.700785
)
d <- data.table(level, subject, score)
  
k <- d[, length(unique(level))] # n factor levels
n <- d[, length(unique(subject))] # n subs

## do it by hand
ss_between_levels <- 0
for(i in 1:k) {
  ss_between_levels <- ss_between_levels + 
    (d[level==i, mean(score)] - d[, mean(score)])^2
}
ss_between_levels <- n * ss_between_levels

ss_between_subject <- 0
for(j in 1:n) {
  ss_between_subject <- ss_between_subject + 
    (d[subject==j, mean(score)] - d[, mean(score)])^2
}
ss_between_subject <- k * ss_between_subject

ss_within_levels <- 0
for(i in 1:k) {
  for(j in 1:n) {
    ss_within_levels <- ss_within_levels + 
      (d[level==i & subject==j, score] - d[level==i, mean(score)])^2
  }
}

ss_error <- ss_within_levels - ss_between_subject

df_between_levels <- k - 1
df_error <- (k-1)*(n-1)

ms_between_levels <- ss_between_levels / df_between_levels
ms_error <- ss_error / df_error

fobs <- ms_between_levels / ms_error

p_val <- pf(fobs, df_between_levels, df_error, lower.tail=F)

## Use the function `ezANOVA()` from the `ez` package to
## perform a repeated measures ANOVA
d[, subject := factor(subject)]
d[, level := factor(level)]
ezANOVA(
  data=d, ## where the data is located
  dv=score, ## the dependent variable
  wid=subject, ## the repeated measure indicator column
  within = .(level), ## a list of repeated measures factors
  type = 3 ## type of sums of squares desired
)
## $ANOVA
##   Effect DFn DFd        F           p p<.05       ges
## 2  level   2   8 370.7887 1.29746e-08     * 0.9852661
## 
## $`Mauchly's Test for Sphericity`
##   Effect         W         p p<.05
## 2  level 0.5279246 0.3835817      
## 
## $`Sphericity Corrections`
##   Effect      GGe       p[GG] p[GG]<.05       HFe        p[HF] p[HF]<.05
## 2  level 0.679313 2.30526e-06         * 0.9073176 5.770776e-08         *

26.3.2 Real data

## Consider the MIS data
d <- fread('https://crossley.github.io/book_stats/data/mis/mis_data.csv')

## We will answer this question:

## Are there significant differences in the mean error per
## subject across phases? Note that this question ignores
## differences between conditions

## First, fix the annoying bug that different subjects in
## different groups have the same number.
d[group==1, subject := subject+10]

## compute mean error per subject
dd <- d[order(subject, phase), mean(error, na.rm=TRUE), .(subject, phase)]

## It's important to code factors as factors
dd[, subject := factor(subject)]
dd[, phase := factor(phase)]

## do it by hand
n <- d[, length(unique(subject))]
k <- d[, length(unique(phase))]

ss_between_phases <- 0
for(i in d[, unique(phase)]) {
  ss_between_phases <- ss_between_phases + 
    (dd[phase==i, mean(V1)] - dd[, mean(V1)])^2
}
ss_between_phases <- n * ss_between_phases

ss_between_subject <- 0
for(j in d[, unique(subject)]) {
  ss_between_subject <- ss_between_subject + 
    (dd[subject==j, mean(V1)] - dd[, mean(V1)])^2
}
ss_between_subject <- k * ss_between_subject

ss_within_phases <- 0
for(i in d[, unique(phase)]) {
  for(j in d[, unique(subject)]) {
    ss_within_phases <- ss_within_phases + 
      (dd[phase==i & subject==j, V1] - dd[phase==i, mean(V1)])^2
  }
}

ss_error <- ss_within_phases - ss_between_subject

df_between_phases <- k - 1
df_error <- (k-1)*(n-1)

ms_between_phases <- ss_between_phases / df_between_phases
ms_error <- ss_error / df_error

fobs <- ms_between_phases / ms_error

p_val <- pf(fobs, df_between_levels, df_error, lower.tail=F)

## Do it with ezANOVA()
ezANOVA(
  data=dd,
  dv=V1,
  wid=subject,
  within=.(phase),
  type=3
  )
## $ANOVA
##   Effect DFn DFd        F            p p<.05       ges
## 2  phase   2  38 123.1449 2.479838e-17     * 0.7786623
## 
## $`Mauchly's Test for Sphericity`
##   Effect         W            p p<.05
## 2  phase 0.4135505 0.0003537993     *
## 
## $`Sphericity Corrections`
##   Effect       GGe        p[GG] p[GG]<.05      HFe        p[HF] p[HF]<.05
## 2  phase 0.6303384 9.950264e-12         * 0.654296 4.302482e-12         *

26.3.3 Making sense of ezANOVA output

What is Mauchly's Test for Sphericity and Sphericity Corrections? Both have to do with the underlying assumptions being made by a repeated measures ANOVA. Time permitting, we will return to this as we review the course material in preparation for the final exam.

26.3.4 Quick note on balanced versus unbalanced data

The formulas I wrote in previous sections for computing the various sums of squares all assumed that we had a perfectly balanced design. Just as with factorial ANOVA, everything gets a little wonky with an unbalanced design. The details of this aren’t really suitable for this class. The important thing to know is that ezANOVA() will handle it all for you.