24 One-way ANOVA

24.1 Learning objectives

Understand the basic logic of how a 1-way ANOVA uses variances to test for equality of means.
Understand why mean squares are used in the test statistic of a 1-way ANOVA rather than sum of squares.
Be able to calculate all terms in an ANOVA table by hand.
Understand how to use the ez library to perform 1-way ANOVAs.

24.2 Introduction

Suppose you have \(k\) different treatment groups. A one-way ANOVA asks if there are any differences in the effects of treatment between any of the groups. This type of test is called an omnibus test.

In raw form, the H’s look like this:

\(\begin{align} & H_0: \mu1 = \mu2 = \mu3 \ldots \\ & H_1: \lnot H_0 \\ \end{align}\)

Usually, we have spent time in steps 1 and 3 of our hypothesis testing procedure to rewrite our hypotheses such that we can come up with a single statistic that is a good estimate of a single all-encompassing parameter. This case seems harder than most, we are interested in so many darn means! It turns out that the primary method used in this situation actually rests on variances. This can certainly seem a bit strange given that the test is concerned with means. Lets see how this all pans out.

24.3 Intuition

The logic of an ANOVA can be understood intuitively as follows:

between-group variation — how different are group means from each other?
within-group variation — how noisy is your data in general?

If the means are all the same (i.e., \(H_0\) is true), then between-group and within-group variation will be very similar.

If \(H_0\) is not true, then between-group variation will be larger than within-group variation.

Another good blurb for intuition can be read here:

https://stats.stackexchange.com/questions/40549/how-can-i-explain-the-intuition-behind-anova

ANOVA is statistical technique used to determine whether a particular classification of the data is useful in understanding the variation of an outcome. Think about dividing people into buckets or classes based on some criteria, like suburban and urban residence. The total variation in the dependent variable (the outcome you care about, like responsiveness to an advertising campaign) can be decomposed into the variation between classes and the variation within classes. When the within-class variation is small relative to the between-class variation, your classification scheme is in some sense meaningful or useful for understanding the world. Members of each cluster behave similarly to one another, but people from different clusters behave distinctively. This decomposition is used to create a formal F test of this hypothesis.

Replacing the word variance in the above with the word similarity might be helpful / more intuitive.

Armed with this intuition, we come up with the following test statistic form.

\(\begin{align} & H_0: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} = 1 \\ & H_1: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} > 1 \\ \end{align}\)

First, notice that an ANOVA always uses an alternative hypothesis that is one-sided. This is because we only reject the null if the between-group variability is significantly greater than the within-group variability, and this always corresponds to a greater than in \(H_1\).

Second, what the heck is \(\sigma^2_{between-group}\) and \(\sigma^2_{within-group}\)? We will unpack this more formally below.

24.4 Formal treatment

Treatment 1	Treatment 2	Treatment 3
\(x_1\)	\(y_1\)	\(z_1\)
\(x_2\)	\(y_2\)	\(z_2\)
\(\ldots\)	\(\ldots\)	\(\ldots\)
\(x_l\)	\(y_m\)	\(z_n\)

If \(l=m=n\) then we have an equal number of observations for each group. When this is the case, we say that we have a balanced design.
Grand mean: \[ G = \frac{\bar{x} + \bar{y} + \bar{z}}{3} \]
Between-group variation: \[ \text{ss}_{between} = l (\bar{x} - G)^2 + m (\bar{y} - G)^2 + n (\bar{z} - G)^2 \]
Within-group variation: \[ \text{ss}_{within} = \sum_{i=1}^l (x_i-\bar{x})^2 + \sum_{i=1}^m (y_i-\bar{y})^2 + \sum_{i=1}^n (z_i-\bar{z})^2 \]

Given this, a reasonable guess for how to estimate the population parameters in \(H_0\) might be:

\[ \widehat{\frac{\sigma^2_{between-group}}{\sigma^2_{within-group}}} = \frac{\text{ss}_{between}}{\text{ss}_{within}} \]

This is pretty much correct, except that in practise we use mean squared deviations, which are just sum of squared deviations divided by their corresponding degrees of freedom. We need to do this because we don’t want the within variance to dominate simply because it usually has many more observations that can contribute to the sum. This leads to the following treatment:

\(\begin{align} & H_0: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} = 1 \\ & H_1: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} > 1 \\ \end{align}\)

\(\begin{align} & \widehat{\frac{\sigma^2_{between-group}}{\sigma^2_{within-group}}} = \frac{\text{ms}_{between}}{\text{ms}_{within}} \\\\ & \text{ms}_{between} = \frac{\text{ss}_{between}}{df_{between}} \\ & \text{ms}_{within} = \frac{\text{ss}_{within}}{df_{within}} \\\\ & df_{between} = n_{groups} - 1 \\ & df_{within} = n_{total} - n_{groups} \\\\ & n_{total} = l + m + n \\\\ & \widehat{\frac{\sigma^2_{between-group}}{\sigma^2_{within-group}}} = \frac{\text{ms}_{between}}{\text{ms}_{within}} \sim F(df_{between}, df_{within})\\ & F_{obs} = \frac{\text{ms}_{between}}{\text{ms}_{within}} \sim F(df_{between}, df_{within})\\ \end{align}\)

I have not shown how we know that the ratio of variances has an F distribution as doing so is beyond the scope of the unit. In general, we leave the process of finding good test statistics and characterising their distributions to the statisticians.

24.4.1 What is an F distribution?

24.4.2 Assumptions

The observations are obtained independently and randomly from the population defined by the factor levels.
The data of each factor level are normally distributed.
These normal populations have a common variance (homogeneity of variance).

24.5 Example 1: Made up data

Consider the following data:

##            x        y         z
## 1:  6.867731 10.89766 27.558906
## 2: 10.918217 17.43715 21.949216
## 3:  5.821857 18.69162 16.893797
## 4: 17.976404 17.87891  8.926501
## 5: 11.647539 13.47306 25.624655

Test the hypothesis that the population means \(\mu_X\), \(\mu_Y\), \(\mu_Z\) are different.

## step 1
## H0: sig_between / sig_within = 1
## H1: sig_between / sig_within > 1

## step 2
alph <- 0.05

## step 3
## ms_ratio_hat = ms_between / ms_within

## step 4
x <- c(6.867731, 10.918217,  5.821857, 17.976404, 11.647539)
y <- c(10.89766, 17.43715, 18.69162, 17.87891, 13.47306)
z <- c(27.558906, 21.949216, 16.893797,  8.926501, 25.624655)

nx <- length(x)
ny <- length(y)
nz <- length(z)

n_total <- nx + ny + nz
n_groups <- 3
  
## mean of each group
mean_x <- mean(x)
mean_y <- mean(y)
mean_z <- mean(z)

## grand mean
grand_mean <- mean(c(x, y, z))

## ss-between
ss_between <- nx*(mean_x - grand_mean)^2 + 
  ny*(mean_y - grand_mean)^2 + 
  nz*(mean_z - grand_mean)^2

## ss-within
ss_within_x <- 0
for(i in 1:nx) {
  ss_within_x <- ss_within_x + (x[i] - mean_x)^2
}
ss_within_y <- 0
for(i in 1:ny) {
  ss_within_y <- ss_within_y + (y[i] - mean_y)^2
}
ss_within_z <- 0
for(i in 1:nz) {
  ss_within_z <- ss_within_z + (z[i] - mean_z)^2
}
ss_within <- ss_within_x + ss_within_y + ss_within_z

## ss-within --- a better way
ss_within <- sum((x - mean_x)^2) +
             sum((y - mean_y)^2) +
             sum((z - mean_z)^2)

## dfs
df_between <- n_groups-1
df_within <- n_total - n_groups

## mean squares
ms_between <- ss_between / df_between
ms_within <- ss_within / df_within

## observed F-value
fobs <- ms_between / ms_within

## compute pval
pval <- pf(fobs, df_between, df_within, lower.tail=F)

## report results
print(c(ss_between, ss_within, df_between, df_within, fobs, pval))

## [1] 227.95300725 361.75603363   2.00000000  12.00000000   3.78077466
## [6]   0.05329273

## Check our work using a builtin R function
d <- data.table(x,y,z)
d <- melt(d, measure.var=c('x', 'y', 'z'))
d[, variable := factor(variable)]
fm <- lm(value ~ variable, data = d)
anova(fm)

## Analysis of Variance Table
## 
## Response: value
##           Df Sum Sq Mean Sq F value  Pr(>F)  
## variable   2 227.95 113.977  3.7808 0.05329 .
## Residuals 12 361.76  30.146                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# We will ultimately be using the ez library for ANOVAs, so
# might as well get started now.
library(ez)
d[, subject := factor(1:.N)]
ezANOVA(
  data=d,
  dv=value,
  wid=subject,
  between=.(variable),
  type=3
)

## $ANOVA
##     Effect DFn DFd        F          p p<.05       ges
## 2 variable   2  12 3.780775 0.05329273       0.3865517
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd      SSn      SSd         F         p p<.05
## 1   2  12 24.07389 156.2317 0.9245458 0.4232137

24.6 Example 2: Criterion learning data

fp <- 'https://crossley.github.io/book_stats/data/criterion_learning/crit_learn.csv'

d <- fread(fp)

## redefine d for simplification
d <- d[cnd %in% c('Delay', 'Long ITI', 'Short ITI')][, mean(unique(t2c)), .(cnd, sub)]
setnames(d, 'V1', 't2c')

ggplot(d, aes(x=cnd, y=t2c)) +
  geom_boxplot() +
  theme(aspect.ratio = 1)

# Calculate the number of groups and observations per group
d[, n_cnds := length(unique(cnd))]
d[, n_obs := .N, .(cnd)]

## Calculate the mean within each cnd:
d[, t2c_mean_cnd := mean(t2c), .(cnd)]

## Calculate the overall mean:
d[, t2c_mean_grand := mean(t2c)]

## Calculate the between-cnd sum of squared differences:
d[, ss_between := sum((t2c_mean_cnd - t2c_mean_grand)^2)]

## Calculate the "within-cnd" sum of squared differences.
d[, ss_within := sum((t2c - t2c_mean_cnd)^2)]

## Compute degrees of freedom
d[, df_between := n_cnds-1]
d[, df_within := .N - n_cnds]

## Calculate MSE terms
d[, mse_between := ss_between / df_between]
d[, mse_within := ss_within / df_within]

## Calculate the F-ratio
d[, fobs := mse_between / mse_within]

## Calculate p-val
d[, pval := pf(fobs, df_between, df_within, lower.tail=FALSE)]

print(round(
  c(d[, unique(ss_between)],
    d[, unique(ss_within)],
    d[, unique(df_between)],
    d[, unique(df_within)],
    d[, unique(fobs)],
    d[, unique(pval)]
    ), 4))

## [1]  25758.2646 630055.8679      2.0000     55.0000      1.1243      0.3322

## Do it with built-in R functions
fm <- lm(t2c ~ cnd, data = d)
anova(fm)

## Analysis of Variance Table
## 
## Response: t2c
##           Df Sum Sq Mean Sq F value Pr(>F)
## cnd        2  25758   12879  1.1243 0.3322
## Residuals 55 630056   11456

# We will ultimately be using the ez library for ANOVAs, so
# might as well get started now.
library(ez)
d[, sub := factor(sub)]
d[, cnd := factor(cnd)]
ezANOVA(
  data=d,
  dv=t2c,
  wid=sub,
  between=.(cnd),
  type=3
)

## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().

## Coefficient covariances computed by hccm()

## $ANOVA
##   Effect DFn DFd        F         p p<.05        ges
## 2    cnd   2  55 1.124269 0.3322408       0.03927678
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd      SSn      SSd        F         p p<.05
## 1   2  55 27898.51 583609.9 1.314592 0.2768892

Notice the warning message about unequal n per group. The gist of this warning is as follows: There are different methods of computing \(ss\) terms (not shown in this lecture). If we have an unbalanced design (i.e., unequal \(n\) per group), then these different methods produce different results. In psychology and neuroscience, the standard approach is to use Type III sums of squares. The reason for this is beyond the scope of this unit.