24 One-way ANOVA
24.1 Learning objectives
Understand the basic logic of how a 1-way ANOVA uses variances to test for equality of means.
Understand why mean squares are used in the test statistic of a 1-way ANOVA rather than sum of squares.
Be able to calculate all terms in an ANOVA table by hand.
Understand how to use the
library to perform 1-way ANOVAs.
24.2 Introduction
Suppose you have \(k\) different treatment groups. A one-way ANOVA asks if there are any differences in the effects of treatment between any of the groups. This type of test is called an omnibus test.
In raw form, the H’s look like this:
\(\begin{align} & H_0: \mu1 = \mu2 = \mu3 \ldots \\ & H_1: \lnot H_0 \\ \end{align}\)
Usually, we have spent time in steps 1 and 3 of our hypothesis testing procedure to rewrite our hypotheses such that we can come up with a single statistic that is a good estimate of a single all-encompassing parameter. This case seems harder than most, we are interested in so many darn means! It turns out that the primary method used in this situation actually rests on variances. This can certainly seem a bit strange given that the test is concerned with means. Lets see how this all pans out.
24.3 Intuition
The logic of an ANOVA can be understood intuitively as follows:
between-group variation — how different are group means from each other?
within-group variation — how noisy is your data in general?
If the means are all the same (i.e., \(H_0\) is true), then between-group and within-group variation will be very similar.
If \(H_0\) is not true, then between-group variation will be larger than within-group variation.
Another good blurb for intuition can be read here:
ANOVA is statistical technique used to determine whether a particular classification of the data is useful in understanding the variation of an outcome. Think about dividing people into buckets or classes based on some criteria, like suburban and urban residence. The total variation in the dependent variable (the outcome you care about, like responsiveness to an advertising campaign) can be decomposed into the variation between classes and the variation within classes. When the within-class variation is small relative to the between-class variation, your classification scheme is in some sense meaningful or useful for understanding the world. Members of each cluster behave similarly to one another, but people from different clusters behave distinctively. This decomposition is used to create a formal F test of this hypothesis.
Replacing the word variance in the above with the word similarity might be helpful / more intuitive.
Armed with this intuition, we come up with the following test statistic form.
\(\begin{align} & H_0: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} = 1 \\ & H_1: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} > 1 \\ \end{align}\)
First, notice that an ANOVA always uses an alternative hypothesis that is one-sided. This is because we only reject the null if the between-group variability is significantly greater than the within-group variability, and this always corresponds to a greater than in \(H_1\).
Second, what the heck is \(\sigma^2_{between-group}\) and \(\sigma^2_{within-group}\)? We will unpack this more formally below.
24.4 Formal treatment
Treatment 1 | Treatment 2 | Treatment 3 |
\(x_1\) | \(y_1\) | \(z_1\) |
\(x_2\) | \(y_2\) | \(z_2\) |
\(\ldots\) | \(\ldots\) | \(\ldots\) |
\(x_l\) | \(y_m\) | \(z_n\) |
If \(l=m=n\) then we have an equal number of observations for each group. When this is the case, we say that we have a balanced design.
Grand mean: \[ G = \frac{\bar{x} + \bar{y} + \bar{z}}{3} \]
Between-group variation: \[ \text{ss}_{between} = l (\bar{x} - G)^2 + m (\bar{y} - G)^2 + n (\bar{z} - G)^2 \]
Within-group variation: \[ \text{ss}_{within} = \sum_{i=1}^l (x_i-\bar{x})^2 + \sum_{i=1}^m (y_i-\bar{y})^2 + \sum_{i=1}^n (z_i-\bar{z})^2 \]
Given this, a reasonable guess for how to estimate the population parameters in \(H_0\) might be:
\[ \widehat{\frac{\sigma^2_{between-group}}{\sigma^2_{within-group}}} = \frac{\text{ss}_{between}}{\text{ss}_{within}} \]
This is pretty much correct, except that in practise we use mean squared deviations, which are just sum of squared deviations divided by their corresponding degrees of freedom. We need to do this because we don’t want the within variance to dominate simply because it usually has many more observations that can contribute to the sum. This leads to the following treatment:
\(\begin{align} & H_0: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} = 1 \\ & H_1: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} > 1 \\ \end{align}\)
\(\begin{align} & \widehat{\frac{\sigma^2_{between-group}}{\sigma^2_{within-group}}} = \frac{\text{ms}_{between}}{\text{ms}_{within}} \\\\ & \text{ms}_{between} = \frac{\text{ss}_{between}}{df_{between}} \\ & \text{ms}_{within} = \frac{\text{ss}_{within}}{df_{within}} \\\\ & df_{between} = n_{groups} - 1 \\ & df_{within} = n_{total} - n_{groups} \\\\ & n_{total} = l + m + n \\\\ & \widehat{\frac{\sigma^2_{between-group}}{\sigma^2_{within-group}}} = \frac{\text{ms}_{between}}{\text{ms}_{within}} \sim F(df_{between}, df_{within})\\ & F_{obs} = \frac{\text{ms}_{between}}{\text{ms}_{within}} \sim F(df_{between}, df_{within})\\ \end{align}\)
I have not shown how we know that the ratio of variances has an F distribution as doing so is beyond the scope of the unit. In general, we leave the process of finding good test statistics and characterising their distributions to the statisticians.
24.5 Example 1: Made up data
Consider the following data:
## x y z
## 1: 6.867731 10.89766 27.558906
## 2: 10.918217 17.43715 21.949216
## 3: 5.821857 18.69162 16.893797
## 4: 17.976404 17.87891 8.926501
## 5: 11.647539 13.47306 25.624655
Test the hypothesis that the population means \(\mu_X\), \(\mu_Y\), \(\mu_Z\) are different.
## step 1
## H0: sig_between / sig_within = 1
## H1: sig_between / sig_within > 1
## step 2
alph <- 0.05
## step 3
## ms_ratio_hat = ms_between / ms_within
## step 4
x <- c(6.867731, 10.918217, 5.821857, 17.976404, 11.647539)
y <- c(10.89766, 17.43715, 18.69162, 17.87891, 13.47306)
z <- c(27.558906, 21.949216, 16.893797, 8.926501, 25.624655)
nx <- length(x)
ny <- length(y)
nz <- length(z)
n_total <- nx + ny + nz
n_groups <- 3
## mean of each group
mean_x <- mean(x)
mean_y <- mean(y)
mean_z <- mean(z)
## grand mean
grand_mean <- mean(c(x, y, z))
## ss-between
ss_between <- nx*(mean_x - grand_mean)^2 +
ny*(mean_y - grand_mean)^2 +
nz*(mean_z - grand_mean)^2
## ss-within
ss_within_x <- 0
for(i in 1:nx) {
ss_within_x <- ss_within_x + (x[i] - mean_x)^2
ss_within_y <- 0
for(i in 1:ny) {
ss_within_y <- ss_within_y + (y[i] - mean_y)^2
ss_within_z <- 0
for(i in 1:nz) {
ss_within_z <- ss_within_z + (z[i] - mean_z)^2
ss_within <- ss_within_x + ss_within_y + ss_within_z
## ss-within --- a better way
ss_within <- sum((x - mean_x)^2) +
sum((y - mean_y)^2) +
sum((z - mean_z)^2)
## dfs
df_between <- n_groups-1
df_within <- n_total - n_groups
## mean squares
ms_between <- ss_between / df_between
ms_within <- ss_within / df_within
## observed F-value
fobs <- ms_between / ms_within
## compute pval
pval <- pf(fobs, df_between, df_within, lower.tail=F)
## report results
print(c(ss_between, ss_within, df_between, df_within, fobs, pval))
## [1] 227.95300725 361.75603363 2.00000000 12.00000000 3.78077466
## [6] 0.05329273
## Check our work using a builtin R function
d <- data.table(x,y,z)
d <- melt(d, measure.var=c('x', 'y', 'z'))
d[, variable := factor(variable)]
fm <- lm(value ~ variable, data = d)
## Analysis of Variance Table
## Response: value
## Df Sum Sq Mean Sq F value Pr(>F)
## variable 2 227.95 113.977 3.7808 0.05329 .
## Residuals 12 361.76 30.146
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# We will ultimately be using the ez library for ANOVAs, so
# might as well get started now.
d[, subject := factor(1:.N)]
## Effect DFn DFd F p p<.05 ges
## 2 variable 2 12 3.780775 0.05329273 0.3865517
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 2 12 24.07389 156.2317 0.9245458 0.4232137
24.6 Example 2: Criterion learning data
fp <- 'https://crossley.github.io/book_stats/data/criterion_learning/crit_learn.csv'
d <- fread(fp)
## redefine d for simplification
d <- d[cnd %in% c('Delay', 'Long ITI', 'Short ITI')][, mean(unique(t2c)), .(cnd, sub)]
setnames(d, 'V1', 't2c')
ggplot(d, aes(x=cnd, y=t2c)) +
geom_boxplot() +
theme(aspect.ratio = 1)
# Calculate the number of groups and observations per group
d[, n_cnds := length(unique(cnd))]
d[, n_obs := .N, .(cnd)]
## Calculate the mean within each cnd:
d[, t2c_mean_cnd := mean(t2c), .(cnd)]
## Calculate the overall mean:
d[, t2c_mean_grand := mean(t2c)]
## Calculate the between-cnd sum of squared differences:
d[, ss_between := sum((t2c_mean_cnd - t2c_mean_grand)^2)]
## Calculate the "within-cnd" sum of squared differences.
d[, ss_within := sum((t2c - t2c_mean_cnd)^2)]
## Compute degrees of freedom
d[, df_between := n_cnds-1]
d[, df_within := .N - n_cnds]
## Calculate MSE terms
d[, mse_between := ss_between / df_between]
d[, mse_within := ss_within / df_within]
## Calculate the F-ratio
d[, fobs := mse_between / mse_within]
## Calculate p-val
d[, pval := pf(fobs, df_between, df_within, lower.tail=FALSE)]
c(d[, unique(ss_between)],
d[, unique(ss_within)],
d[, unique(df_between)],
d[, unique(df_within)],
d[, unique(fobs)],
d[, unique(pval)]
), 4))
## [1] 25758.2646 630055.8679 2.0000 55.0000 1.1243 0.3322
## Analysis of Variance Table
## Response: t2c
## Df Sum Sq Mean Sq F value Pr(>F)
## cnd 2 25758 12879 1.1243 0.3322
## Residuals 55 630056 11456
# We will ultimately be using the ez library for ANOVAs, so
# might as well get started now.
d[, sub := factor(sub)]
d[, cnd := factor(cnd)]
## Warning: Data is unbalanced (unequal N per group). Make sure you specified a
## well-considered value for the type argument to ezANOVA().
## Coefficient covariances computed by hccm()
## Effect DFn DFd F p p<.05 ges
## 2 cnd 2 55 1.124269 0.3322408 0.03927678
## $`Levene's Test for Homogeneity of Variance`
## DFn DFd SSn SSd F p p<.05
## 1 2 55 27898.51 583609.9 1.314592 0.2768892
Notice the warning message about unequal n per group. The gist of this warning is as follows: There are different methods of computing \(ss\) terms (not shown in this lecture). If we have an unbalanced design (i.e., unequal \(n\) per group), then these different methods produce different results. In psychology and neuroscience, the standard approach is to use Type III sums of squares. The reason for this is beyond the scope of this unit.