# load libraries
library(data.table)
library(ggplot2)
# clean work space
rm(list = ls())
# init colorscheme
COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
theme_minimal(base_size = 13) +
theme(
panel.grid.minor = element_blank(),
strip.text = element_text(face = "bold"),
legend.position = "bottom"
)
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))
So far, you have used t-tests to compare two means. In this tutorial, you will learn what to do when you have three or more independent groups.
If you ran every possible pairwise t-test, your chance of making at least one Type I error would increase across the set of tests. One-way ANOVA solves this problem by testing all group means together in a single omnibus F-test.
In this tutorial, you will work with a simulated attention-training dataset in which different participants completed one of three training schedules:
briefmoderateextendedYour dependent variable will be mean response time
(mean_rt). Each row in the dataset is one participant, and
each participant appears in only one training condition. That makes this
a true one-factor, between-subjects design.
d <- fread("data/attention_training_summary.csv")
str(d)
## Classes 'data.table' and 'data.frame': 36 obs. of 4 variables:
## $ participant_id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ training_condition: chr "brief" "brief" "brief" "brief" ...
## $ mean_rt : num 712 690 702 725 694 ...
## $ prop_correct : num 0.79 0.82 0.8 0.78 0.81 0.77 0.8 0.82 0.78 0.79 ...
## - attr(*, ".internal.selfref")=<externalptr>
head(d)
## participant_id training_condition mean_rt prop_correct
## <int> <char> <num> <num>
## 1: 1 brief 712.4 0.79
## 2: 2 brief 689.7 0.82
## 3: 3 brief 701.9 0.80
## 4: 4 brief 725.3 0.78
## 5: 5 brief 694.5 0.81
## 6: 6 brief 718.2 0.77
This dataset contains one row per participant and four variables:
participant_id - participant identifiertraining_condition - the training schedule assigned to
that participantmean_rt - the participant’s mean response time in
millisecondsprop_correct - the participant’s mean proportion
correctBefore doing any inferential test, always check the unit of analysis. Here, the unit of analysis is the participant. That is the correct level for a between-subjects one-way ANOVA because each participant contributes only one summary score.
For the rest of this tutorial, focus on response time.
d[, training_condition := factor(
training_condition,
levels = c("brief", "moderate", "extended")
)]
Start with a boxplot.
ggplot(d, aes(x = training_condition, y = mean_rt, fill = training_condition)) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.12, alpha = 0.5) +
scale_fill_manual(values = COL) +
labs(
x = "Training condition",
y = "Mean response time (ms)",
title = "Response time by attention-training condition",
fill = "Condition"
) +
theme(legend.position = "none")
Mean response time by training condition.
Now compute the group means and standard errors.
d_sum <- d[, .(
mean_rt = mean(mean_rt),
se_rt = sd(mean_rt) / sqrt(.N)
), by = training_condition]
d_sum
## training_condition mean_rt se_rt
## <fctr> <num> <num>
## 1: brief 707.2667 3.423235
## 2: moderate 652.9833 2.484980
## 3: extended 596.3333 2.177374
ggplot(d_sum, aes(x = training_condition, y = mean_rt, fill = training_condition)) +
geom_col(width = 0.6) +
geom_errorbar(
aes(ymin = mean_rt - se_rt, ymax = mean_rt + se_rt),
width = 0.15
) +
scale_fill_manual(values = COL) +
labs(
x = "Training condition",
y = "Mean response time (ms)",
title = "Mean response time by condition",
fill = "Condition"
) +
theme(legend.position = "none")
Group means with standard error bars.
From the plots, it looks as though response time decreases as training becomes more extensive. The next question is whether those differences are large enough to be statistically reliable.
With three groups, there are three pairwise comparisons:
You could run three independent-samples t-tests:
t.test(mean_rt ~ training_condition,
data = d[training_condition %in% c("brief", "moderate")])
##
## Welch Two Sample t-test
##
## data: mean_rt by training_condition
## t = 12.833, df = 20.073, p-value = 3.915e-11
## alternative hypothesis: true difference in means between group brief and group moderate is not equal to 0
## 95 percent confidence interval:
## 45.46159 63.10508
## sample estimates:
## mean in group brief mean in group moderate
## 707.2667 652.9833
t.test(mean_rt ~ training_condition,
data = d[training_condition %in% c("brief", "extended")])
##
## Welch Two Sample t-test
##
## data: mean_rt by training_condition
## t = 27.343, df = 18.649, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group brief and group extended is not equal to 0
## 95 percent confidence interval:
## 102.4310 119.4356
## sample estimates:
## mean in group brief mean in group extended
## 707.2667 596.3333
t.test(mean_rt ~ training_condition,
data = d[training_condition %in% c("moderate", "extended")])
##
## Welch Two Sample t-test
##
## data: mean_rt by training_condition
## t = 17.146, df = 21.627, p-value = 4.518e-14
## alternative hypothesis: true difference in means between group moderate and group extended is not equal to 0
## 95 percent confidence interval:
## 49.79116 63.50884
## sample estimates:
## mean in group moderate mean in group extended
## 652.9833 596.3333
The problem is that the false-positive rate accumulates across the set of comparisons. One-way ANOVA gives you a single omnibus test of:
\[H_0: \mu_{brief} = \mu_{moderate} = \mu_{extended}\]
If the omnibus test is significant, you then follow it up with corrected post-hoc comparisons.
In base R, a one-way ANOVA is fitted with aov().
fit <- aov(mean_rt ~ training_condition, data = d)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## training_condition 2 73848 36924 407.8 <2e-16 ***
## Residuals 33 2988 91
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
How to read the output:
Df gives the degrees of freedomSum Sq gives the sums of squaresMean Sq gives the mean squaresF value is the ANOVA test statisticPr(>F) is the p-value for the omnibus testIf the p-value is small, you can conclude that not all three group means are equal.
If the omnibus ANOVA is significant, you still need to ask which groups differ. Tukey’s Honestly Significant Difference test is a standard choice for all pairwise comparisons in a one-way ANOVA.
TukeyHSD(fit)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = mean_rt ~ training_condition, data = d)
##
## $training_condition
## diff lwr upr p adj
## moderate-brief -54.28333 -63.81523 -44.75144 0
## extended-brief -110.93333 -120.46523 -101.40144 0
## extended-moderate -56.65000 -66.18189 -47.11811 0
This output gives the mean difference for each pair, a confidence interval, and an adjusted p-value that controls the family-wise error rate.
Interpretation guide:
One-way ANOVA depends on three main assumptions:
d_resid <- data.table(resid = residuals(fit))
ggplot(d_resid, aes(sample = resid)) +
stat_qq(colour = COL[1]) +
stat_qq_line(colour = COL[2]) +
labs(
x = "Theoretical quantiles",
y = "Sample quantiles",
title = "Q-Q plot of residuals"
)
Q-Q plot of ANOVA residuals.
bartlett.test(mean_rt ~ training_condition, data = d)
##
## Bartlett test of homogeneity of variances
##
## data: mean_rt by training_condition
## Bartlett's K-squared = 2.3753, df = 2, p-value = 0.3049
Use these checks to support your interpretation, not to replace it. With moderate group sizes, ANOVA is reasonably robust to mild deviations, but you should still inspect the assumptions rather than ignore them.
Your assigned dataset is determined by your student ID number. Take
the last digit and compute last_digit %% 3 in R.
0: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_switch1: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_auto2: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_unlearnLoad your assigned dataset and run a one-way ANOVA. Specifically:
aov().