One-way ANOVA

2025

Introduction

One-way ANOVA (Analysis of Variance) is used to determine if there are ANY differences between the means of at least two (but usually three or more) independent groups based on one categorical independent variable.
It’s a test for equality of means but it uses variances under the hood to make this determination.
ANOVA is an omnibus test. If the test is significant, we know that at least one group is different from the others, but NOT which groups are different.

Naming conventions

A one-way ANOVA is used when there is one between-subjects factor.
A one-way repeated measures ANOVA is used when there is one within-subjects factor. This will be covered in a seperate slide deck.

What is a Factor?

In the context of experiment design and ANOVAs, a factor is a categorical variable that defines the groups or conditions in your experiment.
A factor is equivalent to an independent variable (see next slide).
In R, a factor is a special type of categorical variable with a fixed set of possible values, called levels.

Example

A study investigating the effect of sleep deprivation on cognitive performance.
Independent Variable: Amount of sleep (categorical with three levels, such as “Full 8 hours,” “4 hours,” and “No sleep”). This is what you control in your experiment design.
Dependent Variable: Performance on a cognitive task measured by scores on a memory test or reaction times. This is a random variable that generates the data for your experiment.

Assumptions of a One-way ANOVA

Independence: Observations are independent and randomly sampled from the population within each factor level.
Normality: The dependent variable is normally distributed within each population defined by the factor levels.
Homogeneity of variance: The population variances are equal across factor levels. This is also called homoscedasticity.

Visualising the assumptions

Example data

##               x group subject
##           <num> <int>   <int>
##  1: -0.56047565     1       1
##  2: -0.23017749     1       2
##  3:  1.55870831     1       3
##  4:  0.07050839     1       4
##  5:  0.12928774     1       5
##  6:  4.71506499     2       6
##  7:  3.46091621     2       7
##  8:  1.73493877     2       8
##  9:  2.31314715     2       9
## 10:  2.55433803     2      10
## 11:  7.22408180     3      11
## 12:  6.35981383     3      12
## 13:  6.40077145     3      13
## 14:  6.11068272     3      14
## 15:  5.44415887     3      15

One-way ANOVA: Hypotheses

\[ \begin{align} & H_0: \mu_1 = \mu_2 = \mu_3 \ldots \\ & H_1: \lnot H_0 \\ \end{align} \]

One-way ANOVA: Hypotheses

Intuition

Between-group variation — This captures the magnitude of the differences between the group means.
Within-group variation — This captures magnitude of the differences within each group.
If the means are all the same (i.e., \(H_0\) is true), then between-group and within-group variation should be equal.

Intuition

Reframe the H’s

We reframe our hypotheses from this:

\[ \begin{align} & H_0: \mu_1 = \mu_2 = \mu_3 \ldots \\ & H_1: \lnot H_0 \\ \end{align} \]

To this:

\[ \begin{align} & H_0: \frac{\text{between-group variation}}{\text{within-group variation}} = 1 \\ & H_1: \frac{\text{between-group variation}}{\text{within-group variation}} > 1 \\ \end{align} \]

ANOVA \(H_1\) is always one-sided

ANOVA always uses an alternative hypothesis that is one-sided.
We reject the null only if the between-group variability is significantly greater than the within-group variability.
This corresponds to a greater than in \(H_1\).

Reframe the H’s: formality

Then from this:

\[ \begin{align} & H_0: \frac{\text{between-group variation}}{\text{within-group variation}} = 1 \\ & H_1: \frac{\text{between-group variation}}{\text{within-group variation}} > 1 \\ \end{align} \]

To this:

\[ \begin{align} & H_0: \frac{\sigma^2_{between}}{\sigma^2_{within}} = 1 \\ & H_1: \frac{\sigma^2_{between}}{\sigma^2_{within}} > 1 \\ \end{align} \]

What are \(\sigma^2_{\text{between}}\) and \(\sigma^2_{\text{within}}\)?

First consider data from a simple experiment with three groups:

Group 1	Group 2	Group 3
\(x_1\)	\(y_1\)	\(z_1\)
\(x_2\)	\(y_2\)	\(z_2\)
\(\ldots\)	\(\ldots\)	\(\ldots\)
\(x_l\)	\(y_m\)	\(z_n\)

If \(l=m=n\) then we have an equal number of observations for each group. When this is the case, we say that we have a balanced design.

What are \(\sigma^2_{\text{between}}\) and \(\sigma^2_{\text{within}}\)?

Grand mean:

\[ G = \frac{l \bar{x} + m \bar{y} + n \bar{z}}{l + m + n} = \frac{\bar{x} + \bar{y} + \bar{z}}{3} \text{ If design is balanced} \]

Between-group variation: Reflects variation of group means around the grand mean.

\[ \text{SS}_{\text{between}} = l (\bar{x} - G)^2 + m (\bar{y} - G)^2 + n (\bar{z} - G)^2 \]

Within-group variation: Reflects variation of individual scores around their group mean.

\[ \text{SS}_{\text{within}} = \sum_{i=1}^l (x_i-\bar{x})^2 + \sum_{i=1}^m (y_i-\bar{y})^2 + \sum_{i=1}^n (z_i-\bar{z})^2 \]

\(SS\) is not a good estimate of \(\sigma^2_{\text{between}}\) and \(\sigma^2_{\text{within}}\)

SS generally gets bigger as you have more values, even if the variability of the data stays the same.
We need a way to standardize the SS so that it isn’t affected by the number of observations.
Dividing SS by degrees of freedom produces a good estimate of the true variability in the population.

\(MS\) is a good estimate of \(\sigma^2_{\text{between}}\) and \(\sigma^2_{\text{within}}\)?

Mean Squared error (MS) is defined as:

\[ \text{MS}_{\text{between}} = \frac{\text{SS}_{\text{between}}}{\text{df}_{\text{between}}} \]

\[ \text{MS}_{\text{within}} = \frac{\text{SS}_{\text{within}}}{\text{df}_{\text{within}}} \]

Degrees of freedom

Degrees of freedom are the number of independent pieces of information available to estimate a parameter.
The group means are tied together through the grand mean, costing 1 degree of freedom.
Between-group degrees of freedom:
\[ df_{\text{between}} = n_{\text{groups}} - 1 \]
Within each group, observations are tied together through their group mean, costing 1 degree of freedom per group.
Within-group degrees of freedom:
\[ df_{\text{within}} = n_{\text{total}} - n_{\text{groups}} \]

What we have so far

We still need to specify how our test statistic is distributed under the null hypothesis.

Sampling distribution of the test statisic

\[ \begin{align} & \widehat{\frac{\sigma^2_{\text{between}}}{\sigma^2_{\text{within}}}} = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}} \sim F(df_{\text{between}}, df_{\text{within}})\\ & F_{obs} = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}} \sim F(df_{\text{between}}, df_{\text{within}})\\ \end{align} \]

One-way ANOVA reciple summary

\[ \begin{align} & H_0: \frac{\sigma^2_{\text{between}}}{\sigma^2_{\text{within}}} = 1 \\ & H_1: \frac{\sigma^2_{\text{between}}}{\sigma^2_{\text{within}}} > 1 \\ & \widehat{\frac{\sigma^2_{\text{between}}}{\sigma^2_{\text{within}}}} = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}} \\\\ & \text{MS}_{\text{between}} = \frac{\text{SS}_{\text{between}}}{df_{\text{between}}} \\ & \text{MS}_{\text{within}} = \frac{\text{SS}_{\text{within}}}{df_{\text{within}}} \\\\ & df_{\text{between}} = n_{groups} - 1 \\ & df_{\text{within}} = n_{total} - n_{groups} \\\\ & n_{total} = l + m + n \\\\ & \widehat{\frac{\sigma^2_{\text{between}}}{\sigma^2_{\text{within}}}} = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}} \sim F(df_{\text{between}}, df_{\text{within}})\\ & F_{obs} = \frac{\text{MS}_{\text{between}}}{\text{MS}_{\text{within}}} \sim F(df_{\text{between}}, df_{\text{within}})\\ \end{align} \]

What is an F distribution?

P-values and decisions

One-Way ANOVA Table (for independent groups)

Source	Df	SS	MS	F
Between groups	\(k-1\)	see above	\(\frac{SS_{between}}{Df_{between}}\)	\(\frac{MS_{between}}{MS_{within}}\)
Within groups	\(N-k\)	see above	\(\frac{SS_{within}}{Df_{within}}\)
Total	\(N-1\)	see above

ANOVA with `ez`

##               x group subject
##           <num> <int>   <int>
##  1: -0.56047565     1       1
##  2: -0.23017749     1       2
##  3:  1.55870831     1       3
##  4:  0.07050839     1       4
##  5:  0.12928774     1       5
##  6:  4.71506499     2       6
##  7:  3.46091621     2       7
##  8:  1.73493877     2       8
##  9:  2.31314715     2       9
## 10:  2.55433803     2      10
## 11:  7.22408180     3      11
## 12:  6.35981383     3      12
## 13:  6.40077145     3      13
## 14:  6.11068272     3      14
## 15:  5.44415887     3      15

ANOVA with `ez`

library(ez)
d[, group := factor(group)]
d[, subject := factor(subject)]
ezANOVA(data=d, dv=x, wid=.(subject), within=NULL, between=.(group), type=3)

ANOVA with `ez`

data: the data.table containing the data.
dv: the dependent variable.
wid: the subject identifier.
within: the within-subjects factor.
between: the between-subjects factor.
type: the type of sum of squares to use.

What are types of sum of squares?

Type I: sequential sum of squares.
Type II: partial sum of squares.
Type III: marginal sum of squares.
We will always use Type III sum of squares.

`ez` ANOVA output

library(ez)
d[, group := factor(group)]
d[, subject := factor(subject)]
ezANOVA(data=d, dv=x, wid=.(subject), within=NULL, between=.(group), type=3)

## $ANOVA
##   Effect DFn DFd        F            p p<.05       ges
## 2  group   2  12 58.10218 6.724665e-07     * 0.9063994
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd       SSn      SSd         F         p p<.05
## 1   2  12 0.4747922 5.082453 0.5605075 0.5851711

`ez` ANOVA Output

Effect: The factor(s) being tested (e.g., group).
DFn: Degrees of freedom for the effect (numerator, between-groups).
DFd: Degrees of freedom for the error term (denominator, within-groups).
F: The F-statistic comparing group variance to error variance.
p: The p-value for the F-test.
p<.05: Whether the result is significant at the \(\alpha = 0.05\) level.
ges: Generalized eta-squared, a measure of effect size.

Levene’s Test for Homogeneity of Variance

We won’t cover Levene’s test in detail.
It’s a test to see if the variances of the groups are equal.
I recommend checking assumptions visually with plots.
More on assumption checking in later weeks.

Reporting the Results of a One-Way ANOVA

To report a one-way ANOVA, you should describe the goal, the test performed, and the main result.
Example:

To assess whether the group means differed, we performed a one-way ANOVA. The analysis revealed no significant differences between groups, \(F(2, 5) = 3.24, p = 0.25\)

Question

Consider the ANOVA write-up from the last slide:

To assess whether the group means differed, we performed a one-way ANOVA. The analysis revealed no significant differences between groups, \(F(2, 5) = 3.24, p = 0.25\).

How many groups were compared?
How many subjects were in each group?
Was this a balanced design?

Click here for the answer

How many groups? The between-groups degrees of freedom is 2. Since \(df_{\text{between}} = n_{\text{groups}} - 1\), there were \(2 + 1 = 3\) groups compared.
How many subjects in total? The within-groups degrees of freedom is 5. Since \(df_{\text{within}} = n_{\text{total}} - n_{\text{groups}}\), \(n_{\text{total}} = 5 + 3 = 8\) subjects.
How many subjects per group? If equally sized groups, then \(8 \div 3 \approx 2.67\), which is not an exact integer — so the groups were not equal in size and this was not a balanced design.

Introduction

Naming conventions

What is a Factor?

Example

Assumptions of a One-way ANOVA

Visualising the assumptions

Example data

One-way ANOVA: Hypotheses

One-way ANOVA: Hypotheses

One-way ANOVA: Hypotheses

Intuition

Intuition

Reframe the H’s

ANOVA \(H_1\) is always one-sided

Reframe the H’s: formality

What are \(\sigma^2_{\text{between}}\) and \(\sigma^2_{\text{within}}\)?

What are \(\sigma^2_{\text{between}}\) and \(\sigma^2_{\text{within}}\)?

\(SS\) is not a good estimate of \(\sigma^2_{\text{between}}\) and \(\sigma^2_{\text{within}}\)

\(MS\) is a good estimate of \(\sigma^2_{\text{between}}\) and \(\sigma^2_{\text{within}}\)?

Degrees of freedom

What we have so far

Sampling distribution of the test statisic

One-way ANOVA reciple summary

What is an F distribution?

P-values and decisions

One-Way ANOVA Table (for independent groups)

ANOVA with ez

ANOVA with ez

ANOVA with ez

What are types of sum of squares?

ez ANOVA output

ez ANOVA Output

Levene’s Test for Homogeneity of Variance

Reporting the Results of a One-Way ANOVA

Question

ANOVA with `ez`

ANOVA with `ez`

ANOVA with `ez`

`ez` ANOVA output

`ez` ANOVA Output