25 Two-way ANOVA

Consider an experiment in which one of two different treatments are administered at one of two different doses. A two-way ANOVA attempts to answer the following three questions:

  1. Do different treatments lead to different outcomes?
  2. Do different doses lead to different outcomes?
  3. Does treatment and dose interact?

25.1 Two-way ANOVA Intuition

The logic of a two-way ANOVA is essentially the same as for a one-way ANOVA. That is, if between-group variation is significantly larger than within-group variation, then we have evidence that different groups have different means. However, with a two-way ANOVA, between-group and withing-group aren’t quite as easily defined, because there are many groups for observations to fall between or within. The basic approach to dealing with this is to see that a two-way ANOVA is actually testing three null hypotheses simultaneously (i.e., the three statements above).

25.2 Visualising the three null hypotheses

Consider the simple example of two treatments \(A\) and \(B\), both with two treatment doses.

25.2.1 Example 1 - No main effects and no interaction

25.2.2 Example 2 - No main effects but with an interaction

25.2.3 Example 3 - Two main effects but no interaction

25.2.4 Example 4 - Two main effects and an interaction

25.3 Two-way ANOVA more formally

To talk about a two-way ANOVA clearly, it is helpful to introduce the following new nomenclature:

  • Let \(y_{ijk}\) be the \(k^{th}\) observation obtained for treatment \(i\) and dose \(j\). In this example, \(i \in \{A, B\}\), \(j \in \{1, 2\}\), and \(k \in \{1,2,...,n_{i,j}\}\), where \(n_{i,j}\) is the number of observations in the corresponding condition.

  • Let \(\bar{y}_{ij \bullet}\) indicate the mean effect observed for treatment \(i\) and dose \(j\).

  • Let \(\bar{y}_{\bullet j \bullet}\) indicate the mean effect observed for dose \(j\), averaged over all possible treatments.

  • Let \(\bar{y}_{i \bullet \bullet}\) indicate the mean effect observed for treatment \(i\), averaged over all possible doses.

  • Let \(\bar{y}_{\bullet \bullet \bullet}\) indicate the mean averaged over all possible treatments and doses.

For this simple case in which we have only two treatments \(A\) and \(B\), only two doses, and we have exactly \(n\) observations for each treatment at each dose, our data looks as follows:

Treatment A Treatment B
Dose 1 \(\bar{y}_{11\bullet} = \frac{y_{111}+y_{112}+\ldots+y_{11n}}{n}\) \(\bar{y}_{21\bullet} = \frac{y_{121}+y_{122}+\ldots+y_{12n}}{n}\) \(\bar{y}_{\bullet 1 \bullet}\)
Dose 2 \(\bar{y}_{12\bullet} = \frac{y_{211}+y_{212}+\ldots+y_{11n}}{n}\) \(\bar{y}_{22\bullet} = \frac{y_{221}+y_{222}+\ldots+y_{22n}}{n}\) \(\bar{y}_{\bullet 2 \bullet}\)
\(\bar{y}_{1 \bullet \bullet}\) \(\bar{y}_{2 \bullet \bullet}\) \(\bar{y}_{\bullet \bullet \bullet}\)

Please note that this is called a full factorial experiment.

With this convention, we can write the H’s as follows:

\(\begin{align} & H_{0_1}: \mu_{Treatment_A} = \mu_{Treatment_B} \\ & H_{0_2}: \mu_{Dose_1} = \mu_{Dose_2} \\ & H_{0_3}: \mu_{Treatment_i | Dose_j} = \mu_{Treatment_i | Dose_j} \\\\ & H_1: \lnot H_0 \\ \end{align}\)

25.3.1 Using variance to test the null hypotheses

  • \(SS_{total}\) is the total variability between all observations.
  • \(SS_{treatment}\) is the variability between different treatments.
  • \(SS_{dose}\) is the variability between different doses.
  • \(SS_{treatment \times dose}\) is the variability between different treatments at different dose levels.
  • \(SS_{error}\) is the variability not accounted for by \(SS_{treatment}\), \(SS_{dose}\), and \(SS_{treatment \times dose}\)

25.3.2 Variability terms akin to between-factor variability

  • \(n_{treatment} =\) number of unique treatment levels.
  • \(n_{dose} =\) number of unique dose levels.
  • \(n_{total} =\) number of observations at level of dose and treatment, which in the following we assume is the same for all combinations of treatment and dose (i.e., we assume a balanced design).

\[\begin{align} SS_{treatment} &= n_{dose} n_{total} \sum_i^{ n_{treatment} } (\bar{y}_{i \bullet \bullet} - \bar{y}_{\bullet \bullet \bullet})^2 \\ SS_{dose} &= n_{treatment} n_{total} \sum_j^{ n_{dose} } (\bar{y}_{\bullet j \bullet} - \bar{y}_{\bullet \bullet \bullet})^2 \\ SS_{treatment \times dose} &= n_{total} \sum_i^{ n_{treatment} } \sum_j^{ n_{dose} } \left( \bar{y}_{i j \bullet} - (\bar{y}_{i \bullet \bullet} + \bar{y}_{\bullet j \bullet} - \bar{y}_{\bullet \bullet \bullet} ) \right)^2 \\ \end{align}\]

25.3.3 Variability terms akin to within-factor variability

\[\begin{align} SS_{error} &= \sum_i^{ n_{treatment} } \sum_j^{ n_{dose} } \sum_k^{ n_{total} } \left( \bar{y}_{i j k} - \bar{y}_{i j \bullet} \right)^2 \\ \end{align}\]

25.3.4 Total variability

\[\begin{align} SS_{Total} &= \sum_i^{ n_{treatment} } \sum_j^{ n_{dose} } \sum_k^{ n_{total} } \left( \bar{y}_{i j k} - \bar{y}_{\bullet \bullet \bullet} \right)^2 \\ \end{align}\]

  • Note that you can think of the \(n_{treatment}\), \(n_{dose}\), and \(n_{total}\) multipliers above serve to cast each \(SS\) term into the same units so that they can be meaningfully compared. You can also think of them as ensuring that \(SS_{total}\) is the sum of all the other sources of variability.

\[SS_{total} = SS_{treatment} + SS_{dose} + SS_{treatment \times dose} + SS_{error}\]

25.3.5 Degrees of freedom and building F-ratios

Effect \(df\) \(SS\) \(MS\) \(F\)
\(Treatment\) \(n_{treatment} - 1\) see above \(\frac{SS_{treatment}}{df_{treatment}}\) \(\frac{MS_{treatment}}{MS_{error}}\)
\(Dose\) \(n_{dose} - 1\) see above \(\frac{SS_{dose}}{df_{dose}}\) \(\frac{MS_{dose}}{MS_{error}}\)
\(Treatment \times Dose\) \((n_{treatment} - 1)(n_{dose} - 1)\) see above \(\frac{SS_{treatment \times dose}}{df_{treatment \times dose}}\) \(\frac{MS_{treatment \times dose}}{MS_{error}}\)
\(Error\) \(N - n_{treatment}n_{dose}\) see above \(\frac{SS_{error}}{df_{error}}\)
  • From here, we compute p-values from the observed \(F\) scores in the table above, and we are off to the hypothesis testing races.

25.3.6 More general experiment and data structure

More generally, two-way ANOVA deals with any two factors of an experiment that contain an arbitrary number of levels. Consider a full factorial experiment with two factors: Factor 1 containing \(n_1\) levels and Factor 2 containing \(n_2\) levels.

Factor 2 level 1 Factor 2 level 2 \(\ldots\) Factor 2 level \(n_1\)
Factor 1 level 1 \(\bar{y}_{11\bullet}\) \(\bar{y}_{12\bullet}\) \(\ldots\) \(\bar{y}_{1 n_2 \bullet}\) \(\bar{y}_{1 \bullet \bullet}\)
Factor 1 level 2 \(\bar{y}_{21\bullet}\) \(\bar{y}_{22\bullet}\) \(\ldots\) \(\bar{y}_{2 n_2 \bullet}\) \(\bar{y}_{2 \bullet \bullet}\)
\(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\) \(\vdots\)
Factor 1 level \(n_1\) \(\bar{y}_{n_1 1 \bullet}\) \(\bar{y}_{n_1 2 \bullet}\) \(\ldots\) \(\bar{y}_{n_1 n_2 \bullet}\) \(\bar{y}_{n_1 \bullet \bullet}\)
\(\bar{y}_{\bullet 1 \bullet}\) \(\bar{y}_{\bullet 2 \bullet}\) \(\ldots\) \(\bar{y}_{\bullet n_2 \bullet}\) \(\bar{y}_{\bullet \bullet \bullet}\)
  • This more general situation doesn’t change our method at all.
  • We still proceed by generating the ANOVA table:
Effect \(df\) \(SS\) \(MS\) \(F\)
\(Factor 1\) \(n_{Factor 1} - 1\) see above \(\frac{SS_{Factor 1}}{df_{Factor 1}}\) \(\frac{MS_{Factor 1}}{MS_{error}}\)
\(Factor 2\) \(n_{Factor 2} - 1\) see above \(\frac{SS_{Factor 2}}{df_{Factor 2}}\) \(\frac{MS_{Factor 2}}{MS_{error}}\)
\(Factor 1 \times Factor 2\) \((n_{Factor 1}-1) (n_{Factor 2}-1)\) see above \(\frac{SS_{Factor 1 \times Factor 2}}{df_{Factor 1 \times Factor 2}}\) \(\frac{MS_{Factor 1 \times Factor 2}}{MS_{error}}\)
\(Error\) \(N - n_{Factor 1}n_{Factor 2}\) see above \(\frac{SS_{error}}{df_{error}}\)

25.4 Two-way ANOVA in R

Consider the following data:

##             y treatment dose
##  1: 16.314771         A    1
##  2:  8.368833         A    1
##  3: 16.648996         A    1
##  4: 16.362147         A    1
##  5: 32.073207         A    2
##  6: 22.300250         A    2
##  7: 25.357165         A    2
##  8: 28.526398         A    2
##  9: 19.971164         B    1
## 10: 32.023267         B    1
## 11: 23.817967         B    1
## 12: 16.004954         B    1
## 13: 34.261715         B    2
## 14: 38.552692         B    2
## 15: 38.503924         B    2
## 16: 37.942446         B    2
y <- c(16.314771, 8.368833, 16.648996, 16.362147, 
       32.073207, 22.300250, 25.357165, 28.526398, 
       19.971164, 32.023267, 23.817967, 16.004954, 
       34.261715, 38.552692, 38.503924, 37.942446)

treatment <- c( "A", "A", "A", "A", "A", "A", "A", "A",
                "B", "B",  "B", "B", "B", "B", "B", "B")

dose <- c( 1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2)

d <- data.table(y=y, 
                treatment=as.factor(treatment), 
                dose=as.factor(dose))

# generate intuition about results before doing the analysis
# (looks like two main effects but no interaction)
dd <- d[, .(mean(y), sd(y)/sqrt(.N)), .(treatment, dose)]
ggplot(dd, aes(treatment, V1, colour=dose)) +
  geom_pointrange(aes(ymin=V1-V2, ymax=V1+V2))

## define number of levels in each factor
n_treatment <- d[, length(unique(treatment))] # number of treatment levels
n_dose <- d[, length(unique(dose))] # number of dose levels
n <- d[, .N, .(treatment, dose)][, unique(N)] # number of observations at each level

## define Df terms
df_treatment <- n_treatment - 1
df_dose <- n_dose - 1
df_interaction <- df_treatment * df_dose
df_error <- d[, .N] - n_treatment*n_dose

## Define SS terms
ss_treatment <- 0
for(i in d[, unique(treatment)]) {
  ss_treatment <- ss_treatment + (d[treatment==i, mean(y)] - d[, mean(y)])^2
}
ss_treatment <- n_dose * n * ss_treatment

ss_dose <- 0
for(i in d[, unique(dose)]) {
  ss_dose <- ss_dose + (d[dose==i, mean(y)] - d[, mean(y)])^2
}
ss_dose <- n_treatment * n * ss_dose

ss_interaction <- 0
for(i in d[, unique(treatment)]) {
  for(j in d[, unique(dose)]) {
    ss_interaction <- ss_interaction +
      (d[treatment==i & dose==j, mean(y)] -
       (d[treatment==i, mean(y)] + d[dose==j, mean(y)] - d[, mean(y)]))^2
  }
}
ss_interaction <- n * ss_interaction

ss_error <- 0
for(i in d[, unique(treatment)]) {
  for(j in d[, unique(dose)]) {
    for(k in 1:n) {
      ss_error <- ss_error + (d[treatment==i & dose==j][k, y] -
                              d[treatment==i & dose==j, mean(y)])^2
    }
  }
}
ss_error <- ss_error

## Define MS terms
ms_treatment <- ss_treatment / df_treatment
ms_dose <- ss_dose / df_dose
ms_interaction <- ss_interaction / df_interaction
ms_error <- ss_error / df_error

## Define F terms
f_treatment <- ms_treatment / ms_error
f_dose <- ms_dose / ms_error
f_interaction <- ms_interaction / ms_error

## Define Pr(>F)
p_treatment <- pf(f_treatment, df_treatment, df_error, lower.tail=F)
p_dose <- pf(f_dose, df_dose, df_error, lower.tail=F)
p_interaction <- pf(f_interaction, df_interaction, df_error, lower.tail=F)

## NOTE: Lots of things to print out here, so I'll leave
## this to a live demo and your own personal tinkering
## around to see the actual values and confirm that they
## match the anova() method below.

## Verify results
fm <- lm(y ~ treatment*dose, data=d)
anova(fm)
## Analysis of Variance Table
## 
## Response: y
##                Df Sum Sq Mean Sq F value    Pr(>F)    
## treatment       1 352.75  352.75 16.6240  0.001534 ** 
## dose            1 729.08  729.08 34.3593 7.697e-05 ***
## treatment:dose  1   2.96    2.96  0.1395  0.715325    
## Residuals      12 254.63   21.22                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Using ezANOVA
d[, subject := 1:.N]
ezANOVA(data=d,
        dv=y,
        wid=subject,
        between=.(treatment, dose),
        type=3)
## Warning: Converting "subject" to factor for ANOVA.
## Coefficient covariances computed by hccm()
## $ANOVA
##           Effect DFn DFd          F           p p<.05        ges
## 2      treatment   1  12 16.6239881 1.53355e-03     * 0.58077121
## 3           dose   1  12 34.3592629 7.69696e-05     * 0.74115205
## 4 treatment:dose   1  12  0.1394674 7.15325e-01       0.01148876
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd      SSn     SSd        F         p p<.05
## 1   3  12 31.57735 113.392 1.113918 0.3818176

25.5 Two-way ANOVA in R using real data

d <- fread('https://crossley.github.io/book_stats/data/mis/mis_data.csv')

d <- d[phase=='Base', .(subject, group, target, error)]
d[, group := factor(group)]
d[, target := factor(target)]

## First, get one observation per subject per group per target
dd <- d[, mean(error), .(subject, group, target)]

## Do different groups have different mean errors?
## Do different targets have different mean errors?
## Does target and group interact?

## begin by making diagnostic plots
ddd <- dd[, .(mean(V1), sd(V1)/sqrt(length(unique(target)))), .(group)]
ggplot(ddd, aes(group, V1)) +
  geom_pointrange(aes(ymin=V1-V2, ymax=V1+V2))

ddd <- dd[, .(mean(V1), sd(V1)/sqrt(length(unique(group)))), .(target)]
ggplot(ddd, aes(target, V1)) +
  geom_pointrange(aes(ymin=V1-V2, ymax=V1+V2))

ddd <- dd[, .(mean(V1), sd(V1)/sqrt(length(unique(target))*length(unique(group)))), .(group, target)]
ggplot(ddd, aes(target, V1, colour=group)) +
  geom_pointrange(aes(ymin=V1-V2, ymax=V1+V2), position=position_dodge(width=.1))

## To my eye, it looks like the effect of group ought to be
## significant, or pretty close to it, but that's probably
## about it.

## Verify results
options(contrasts = c("contr.sum","contr.poly")) # type 3 ss
fm <- lm(V1 ~ group * target, data=dd)
anova(fm)
## Analysis of Variance Table
## 
## Response: V1
##               Df Sum Sq Mean Sq  F value Pr(>F)    
## group          1 461.40  461.40 431.8110 <2e-16 ***
## target        10  13.45    1.35   1.2589 0.2561    
## group:target  10  10.80    1.08   1.0106 0.4358    
## Residuals    198 211.57    1.07                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ezANOVA(data=dd,
        dv=V1,
        wid=subject,
        between=.(group, target),
        type=3)
## Warning: Converting "subject" to factor for ANOVA.
## Warning: The column supplied as the wid variable contains non-unique values
## across levels of the supplied between-Ss variables. Automatically fixing this
## by generating unique wid labels.
## Coefficient covariances computed by hccm()
## $ANOVA
##         Effect DFn DFd          F            p p<.05        ges
## 2        group   1 198 431.811012 1.207998e-51     * 0.68561998
## 3       target  10 198   1.258945 2.560987e-01       0.05978197
## 4 group:target  10 198   1.010574 4.358029e-01       0.04856061
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd      SSn      SSd        F            p p<.05
## 1  21 198 37.66427 72.21667 4.917428 5.462064e-10     *

25.6 Unbalanced designs

A balanced design means that there exactly the same amount of observations (denoted by \(n\) in the previous examples of this lecture) in every possible combination of factors explored by an experiment. Well, it turns out that it is fairly common to have an unbalanced design, which just means that there are different numbers of observations per factor level combination. Furthermore, it turns that for balanced designs, the sums of squares calculations I showed are unambiguous and everybody agrees how to go about computing them. For unbalanced designs, however, it turns out that there are more than one way of going about calculating sums of squares. This is often the reason why various statistical software packages (e.g., R, SAS, SPSS, etc) produce slightly different results. A good discussion can be found [here](https://www.r-bloggers.com/anova-%E2%80%93-type-iiiiii-ss-explained/, but we won’t go into it here. The important thing to remember is that in psychology and neuroscience, the standard is to apply Type III sums of squares**, and to do that, you just need to execute the line options(contrasts = c("contr.sum","contr.poly")) anywhere before you fit your linear model and generate your ANOVA summary. Also note that options(contrasts = c("contr.sum","contr.poly")) doesn’t do anything at all to balanced designs, so there is no harm in just doing it all the time if you care for that approach. Finally, note that whether or not a design is balanced or not only matters for designs with at least two factors.