2024

Introduction

  • One-way ANOVA (Analysis of Variance) is used to determine if there are any differences between the means of at least two (but usually three or more) independent groups based on one categorical independent variable.

  • It’s a test for equality of means but it uses variances under the hood to make this determination.

  • ANOVA is an omnibus test. If the test is significant, we know that at least one group is different from the others, but NOT which groups are different.

Naming conventions

  • A one-way ANOVA is used with one between-subjects factor.

  • A one-way repeated measures ANOVA is used with on within-subjects factor. This will be covered in a seperate slide deck.

Example

  • study investigating the effect of sleep deprivation on cognitive performance.

  • Independent Variable: Amount of sleep (categorical with three levels, such as “Full 8 hours,” “4 hours,” and “No sleep”).

  • Dependent Variable: Performance on a cognitive task measured by scores on a memory test or reaction times (just needs to be continuous).

Assumptions of a One-way ANOVA

  • Independence: The observations are obtained independently and randomly from the population defined by the factor levels.

  • Normality: Raw data of each factor level are normally distributed.

  • Homogeneity of variance: Raw data normal populations have a common variance. This is also called homoscedasticity.

Visualising the assumptions

Example data

##               x group subject
##           <num> <int>   <int>
##  1: -0.56047565     1       1
##  2: -0.23017749     1       2
##  3:  1.55870831     1       3
##  4:  0.07050839     1       4
##  5:  0.12928774     1       5
##  6:  4.71506499     2       6
##  7:  3.46091621     2       7
##  8:  1.73493877     2       8
##  9:  2.31314715     2       9
## 10:  2.55433803     2      10
## 11:  7.22408180     3      11
## 12:  6.35981383     3      12
## 13:  6.40077145     3      13
## 14:  6.11068272     3      14
## 15:  5.44415887     3      15

One-way ANOVA: Hypotheses

\[ \begin{align} & H_0: \mu_1 = \mu_2 = \mu_3 \ldots \\ & H_1: \lnot H_0 \\ \end{align} \]

Intuition

  • Between-group variation — how different are group means from each other?

  • Within-group variation — how noisy is your data in general?

  • If the means are all the same (i.e., \(H_0\) is true), then between-group and within-group variation will be very similar.

Reframed hypotheses

\[ \begin{align} & H_0: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} = 1 \\ & H_1: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} > 1 \\ \end{align} \]

\(H_1\) is always one-sided

  • ANOVA always uses an alternative hypothesis that is one-sided.

  • We reject the null only if the between-group variability is significantly greater than the within-group variability.

  • This corresponds to a greater than in \(H_1\).

What is between-group and within-group variation?

Group 1 Group 2 Group 3
\(x_1\) \(y_1\) \(z_1\)
\(x_2\) \(y_2\) \(z_2\)
\(\ldots\) \(\ldots\) \(\ldots\)
\(x_l\) \(y_m\) \(z_n\)
  • If \(l=m=n\) then we have an equal number of observations for each group. When this is the case, we say that we have a balanced design.

  • Grand mean:

\[ G = \frac{\bar{x} + \bar{y} + \bar{z}}{3} \]

  • Between-group variation:

\[ \text{ss}_{between} = l (\bar{x} - G)^2 + m (\bar{y} - G)^2 + n (\bar{z} - G)^2 \]

  • Within-group variation:

\[ \text{ss}_{within} = \sum_{i=1}^l (x_i-\bar{x})^2 + \sum_{i=1}^m (y_i-\bar{y})^2 + \sum_{i=1}^n (z_i-\bar{z})^2 \]

test statistic sampling distribution

\[ \begin{align} & H_0: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} = 1 \\ & H_1: \frac{\sigma^2_{between-group}}{\sigma^2_{within-group}} > 1 \\ & \widehat{\frac{\sigma^2_{between-group}}{\sigma^2_{within-group}}} = \frac{\text{ms}_{between}}{\text{ms}_{within}} \\\\ & \text{ms}_{between} = \frac{\text{ss}_{between}}{df_{between}} \\ & \text{ms}_{within} = \frac{\text{ss}_{within}}{df_{within}} \\\\ & df_{between} = n_{groups} - 1 \\ & df_{within} = n_{total} - n_{groups} \\\\ & n_{total} = l + m + n \\\\ & \widehat{\frac{\sigma^2_{between-group}}{\sigma^2_{within-group}}} = \frac{\text{ms}_{between}}{\text{ms}_{within}} \sim F(df_{between}, df_{within})\\ & F_{obs} = \frac{\text{ms}_{between}}{\text{ms}_{within}} \sim F(df_{between}, df_{within})\\ \end{align} \]

What is an F distribution?

P-values and decisions

One-Way ANOVA Table (for independent groups)

Source Df SS MS F P(>F)
Between groups \(k-1\) see above \(\frac{SS_{between}}{Df_{between}}\) \(\frac{MS_{between}}{MS_{within}}\)
Within groups \(N-k\) see above \(\frac{SS_{within}}{Df_{within}}\)
Total \(N-1\) see above

ANOVA with ez

##               x group subject
##           <num> <int>   <int>
##  1: -0.56047565     1       1
##  2: -0.23017749     1       2
##  3:  1.55870831     1       3
##  4:  0.07050839     1       4
##  5:  0.12928774     1       5
##  6:  4.71506499     2       6
##  7:  3.46091621     2       7
##  8:  1.73493877     2       8
##  9:  2.31314715     2       9
## 10:  2.55433803     2      10
## 11:  7.22408180     3      11
## 12:  6.35981383     3      12
## 13:  6.40077145     3      13
## 14:  6.11068272     3      14
## 15:  5.44415887     3      15

library(ez)
d[, group := factor(group)]
d[, subject := factor(subject)]
ezANOVA(data=d, dv=x, wid=.(subject), within=NULL, between=.(group), type=3)
## Coefficient covariances computed by hccm()
## $ANOVA
##   Effect DFn DFd        F            p p<.05       ges
## 2  group   2  12 58.10218 6.724665e-07     * 0.9063994
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd       SSn      SSd         F         p p<.05
## 1   2  12 0.4747922 5.082453 0.5605075 0.5851711

  • data: the data.table containing the data.
  • dv: the dependent variable.
  • wid: the subject identifier.
  • within: the within-subjects factor.
  • between: the between-subjects factor.
  • type: the type of sum of squares to use.

What are types of sum of squares?

  • Type I: sequential sum of squares.
  • Type II: partial sum of squares.
  • Type III: marginal sum of squares.
  • We will always use Type III sum of squares.

Levene’s Test for Homogeneity of Variance

  • We won’t cover Levene’s test in detail.

  • It’s a test to see if the variances of the groups are equal.

  • In keeping with my advice throughout the rest of this unit, I recommend using checking assumptions visually with plots.

Reporting the results of a one-way ANOVA

  • To report a one-way ANOVA write something like this the following:

  • To assess whether the means of the groups are different, we performed a one-way ANOVA. This analysis revealed no significant differences between the groups (\(F(5, 2) = 3.24, p = 0.25\))