# load libraries

library(data.table)
library(ggplot2)

# clean work space

rm(list = ls())

# init colorscheme

COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
  theme_minimal(base_size = 13) +
    theme(
      panel.grid.minor = element_blank(),
      strip.text = element_text(face = "bold"),
      legend.position = "bottom"
    )
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))

Overview

So far, you have used t-tests to compare two means. In this tutorial, you will learn what to do when you have three or more independent groups.

If you ran every possible pairwise t-test, your chance of making at least one Type I error would increase across the set of tests. One-way ANOVA solves this problem by testing all group means together in a single omnibus F-test.

In this tutorial, you will work with a simulated attention-training dataset in which different participants completed one of three training schedules:

  • brief
  • moderate
  • extended

Your dependent variable will be mean response time (mean_rt). Each row in the dataset is one participant, and each participant appears in only one training condition. That makes this a true one-factor, between-subjects design.


Part 1 - Introducing the data

d <- fread("data/attention_training_summary.csv")
str(d)
## Classes 'data.table' and 'data.frame':   36 obs. of  4 variables:
##  $ participant_id    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ training_condition: chr  "brief" "brief" "brief" "brief" ...
##  $ mean_rt           : num  712 690 702 725 694 ...
##  $ prop_correct      : num  0.79 0.82 0.8 0.78 0.81 0.77 0.8 0.82 0.78 0.79 ...
##  - attr(*, ".internal.selfref")=<externalptr>
head(d)
##    participant_id training_condition mean_rt prop_correct
##             <int>             <char>   <num>        <num>
## 1:              1              brief   712.4         0.79
## 2:              2              brief   689.7         0.82
## 3:              3              brief   701.9         0.80
## 4:              4              brief   725.3         0.78
## 5:              5              brief   694.5         0.81
## 6:              6              brief   718.2         0.77

This dataset contains one row per participant and four variables:

  • participant_id - participant identifier
  • training_condition - the training schedule assigned to that participant
  • mean_rt - the participant’s mean response time in milliseconds
  • prop_correct - the participant’s mean proportion correct

Before doing any inferential test, always check the unit of analysis. Here, the unit of analysis is the participant. That is the correct level for a between-subjects one-way ANOVA because each participant contributes only one summary score.

For the rest of this tutorial, focus on response time.

d[, training_condition := factor(
  training_condition,
  levels = c("brief", "moderate", "extended")
)]

Part 2 - Visualise the group differences

Start with a boxplot.

ggplot(d, aes(x = training_condition, y = mean_rt, fill = training_condition)) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.12, alpha = 0.5) +
  scale_fill_manual(values = COL) +
  labs(
    x = "Training condition",
    y = "Mean response time (ms)",
    title = "Response time by attention-training condition",
    fill = "Condition"
  ) +
  theme(legend.position = "none")
Mean response time by training condition.

Mean response time by training condition.

Now compute the group means and standard errors.

d_sum <- d[, .(
  mean_rt = mean(mean_rt),
  se_rt   = sd(mean_rt) / sqrt(.N)
), by = training_condition]

d_sum
##    training_condition  mean_rt    se_rt
##                <fctr>    <num>    <num>
## 1:              brief 707.2667 3.423235
## 2:           moderate 652.9833 2.484980
## 3:           extended 596.3333 2.177374
ggplot(d_sum, aes(x = training_condition, y = mean_rt, fill = training_condition)) +
  geom_col(width = 0.6) +
  geom_errorbar(
    aes(ymin = mean_rt - se_rt, ymax = mean_rt + se_rt),
    width = 0.15
  ) +
  scale_fill_manual(values = COL) +
  labs(
    x = "Training condition",
    y = "Mean response time (ms)",
    title = "Mean response time by condition",
    fill = "Condition"
  ) +
  theme(legend.position = "none")
Group means with standard error bars.

Group means with standard error bars.

From the plots, it looks as though response time decreases as training becomes more extensive. The next question is whether those differences are large enough to be statistically reliable.


Part 3 - Why not just run three t-tests?

With three groups, there are three pairwise comparisons:

  • brief vs moderate
  • brief vs extended
  • moderate vs extended

You could run three independent-samples t-tests:

t.test(mean_rt ~ training_condition,
       data = d[training_condition %in% c("brief", "moderate")])
## 
##  Welch Two Sample t-test
## 
## data:  mean_rt by training_condition
## t = 12.833, df = 20.073, p-value = 3.915e-11
## alternative hypothesis: true difference in means between group brief and group moderate is not equal to 0
## 95 percent confidence interval:
##  45.46159 63.10508
## sample estimates:
##    mean in group brief mean in group moderate 
##               707.2667               652.9833

t.test(mean_rt ~ training_condition,
       data = d[training_condition %in% c("brief", "extended")])
## 
##  Welch Two Sample t-test
## 
## data:  mean_rt by training_condition
## t = 27.343, df = 18.649, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group brief and group extended is not equal to 0
## 95 percent confidence interval:
##  102.4310 119.4356
## sample estimates:
##    mean in group brief mean in group extended 
##               707.2667               596.3333

t.test(mean_rt ~ training_condition,
       data = d[training_condition %in% c("moderate", "extended")])
## 
##  Welch Two Sample t-test
## 
## data:  mean_rt by training_condition
## t = 17.146, df = 21.627, p-value = 4.518e-14
## alternative hypothesis: true difference in means between group moderate and group extended is not equal to 0
## 95 percent confidence interval:
##  49.79116 63.50884
## sample estimates:
## mean in group moderate mean in group extended 
##               652.9833               596.3333

The problem is that the false-positive rate accumulates across the set of comparisons. One-way ANOVA gives you a single omnibus test of:

\[H_0: \mu_{brief} = \mu_{moderate} = \mu_{extended}\]

If the omnibus test is significant, you then follow it up with corrected post-hoc comparisons.


Part 4 - Run the one-way ANOVA

In base R, a one-way ANOVA is fitted with aov().

fit <- aov(mean_rt ~ training_condition, data = d)
summary(fit)
##                    Df Sum Sq Mean Sq F value Pr(>F)    
## training_condition  2  73848   36924   407.8 <2e-16 ***
## Residuals          33   2988      91                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

How to read the output:

  • Df gives the degrees of freedom
  • Sum Sq gives the sums of squares
  • Mean Sq gives the mean squares
  • F value is the ANOVA test statistic
  • Pr(>F) is the p-value for the omnibus test

If the p-value is small, you can conclude that not all three group means are equal.


Part 5 - Post-hoc comparisons

If the omnibus ANOVA is significant, you still need to ask which groups differ. Tukey’s Honestly Significant Difference test is a standard choice for all pairwise comparisons in a one-way ANOVA.

TukeyHSD(fit)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = mean_rt ~ training_condition, data = d)
## 
## $training_condition
##                         diff        lwr        upr p adj
## moderate-brief     -54.28333  -63.81523  -44.75144     0
## extended-brief    -110.93333 -120.46523 -101.40144     0
## extended-moderate  -56.65000  -66.18189  -47.11811     0

This output gives the mean difference for each pair, a confidence interval, and an adjusted p-value that controls the family-wise error rate.

Interpretation guide:

  • if the adjusted p-value is below your alpha level, that pair differs
  • if the confidence interval does not include zero, that is consistent with a reliable difference

Part 6 - Assumptions

One-way ANOVA depends on three main assumptions:

  1. Independence - each participant contributes one score and participants are independent of one another
  2. Approximately normal residuals
  3. Similar variance across groups

Normality of residuals

d_resid <- data.table(resid = residuals(fit))

ggplot(d_resid, aes(sample = resid)) +
  stat_qq(colour = COL[1]) +
  stat_qq_line(colour = COL[2]) +
  labs(
    x = "Theoretical quantiles",
    y = "Sample quantiles",
    title = "Q-Q plot of residuals"
  )
Q-Q plot of ANOVA residuals.

Q-Q plot of ANOVA residuals.

Homogeneity of variance

bartlett.test(mean_rt ~ training_condition, data = d)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  mean_rt by training_condition
## Bartlett's K-squared = 2.3753, df = 2, p-value = 0.3049

Use these checks to support your interpretation, not to replace it. With moderate group sizes, ANOVA is reasonably robust to mild deviations, but you should still inspect the assumptions rather than ignore them.


Apply this to your assigned dataset

Your assigned dataset is determined by your student ID number. Take the last digit and compute last_digit %% 3 in R.

Load your assigned dataset and run a one-way ANOVA. Specifically:

  1. Identify one independent variable with three or more levels.
  2. Make sure your rows are independent at the level you analyse.
  3. Aggregate your data to one score per participant or unit if needed.
  4. Run a one-way ANOVA with aov().
  5. Report the F statistic, degrees of freedom, and p-value.
  6. If the result is significant, run post-hoc comparisons.
  7. Create at least one plot showing the group differences.
  8. Write a short paragraph interpreting the result in plain language.