# load libraries
library(data.table)
library(ggplot2)
# clean work space
rm(list = ls())
# init colorscheme
COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
theme_minimal(base_size = 13) +
theme(
panel.grid.minor = element_blank(),
strip.text = element_text(face = "bold"),
legend.position = "bottom"
)
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))
The t-test is one of the most widely used inferential tools in statistics. There are three main variants, and choosing the right one depends entirely on the structure of your data and your research question:
In this tutorial we work through each variant using rat T-maze data. By the end you should be able to choose the appropriate t-test for a given design, run it in R, and interpret its output.
Research question: Do rats in Experiment 1 perform above chance (50% reward rate)?
In a T-maze with two arms, a rat choosing randomly would earn a
reward on roughly 50% of trials. If rats have learned the task, their
mean reward rate should be greater than 0.5. We can test this
with a one-sample t-test against mu = 0.5.
d <- fread("data/experiment_1_summary.csv")
head(d)
## experiment rat_id trial cage_context scent choice reward reaction_time maze_run_time
## <char> <int> <int> <char> <char> <char> <int> <num> <num>
## 1: exp1 1 1 mesh none L 1 0.5716877 3.789606
## 2: exp1 1 2 wood none R 0 0.9360044 3.209561
## 3: exp1 1 3 mesh none L 1 0.4574906 2.137541
## 4: exp1 1 4 mesh peppermint L 1 0.4461169 2.648328
## 5: exp1 1 5 mesh lemon L 1 0.2000000 2.727809
## 6: exp1 1 6 mesh peppermint L 1 0.6402233 2.699873
d_rat <- d[, .(mean_reward = mean(reward)), .(rat_id)]
d_rat
## rat_id mean_reward
## <int> <num>
## 1: 1 0.585
## 2: 2 0.520
## 3: 3 0.520
## 4: 4 0.505
## 5: 5 0.535
## 6: 6 0.640
## 7: 7 0.455
## 8: 8 0.395
## 9: 9 0.545
## 10: 10 0.545
## 11: 11 0.515
## 12: 12 0.485
## 13: 13 0.580
## 14: 14 0.530
## 15: 15 0.565
## 16: 16 0.555
## 17: 17 0.490
## 18: 18 0.525
## 19: 19 0.540
## 20: 20 0.455
## 21: 21 0.445
## 22: 22 0.445
## 23: 23 0.490
## 24: 24 0.560
## rat_id mean_reward
t.test(d_rat$mean_reward, mu = 0.5)
##
## One Sample t-test
##
## data: d_rat$mean_reward
## t = 1.6008, df = 23, p-value = 0.1231
## alternative hypothesis: true mean is not equal to 0.5
## 95 percent confidence interval:
## 0.4948245 0.5405921
## sample estimates:
## mean of x
## 0.5177083
mu = 0.5.If the p-value is below 0.05 and the confidence interval lies entirely above 0.5, we have evidence that rats perform above chance. The direction of the effect (mean > 0.5) and the width of the CI both matter — a very narrow CI far above 0.5 would indicate a reliable, sizeable effect.
ggplot(d_rat, aes(x = mean_reward)) +
geom_histogram(binwidth = 0.02, fill = COL[1], colour = "white") +
geom_vline(xintercept = 0.5, linetype = "dashed", colour = "firebrick", linewidth = 1) +
labs(
x = "Mean reward rate",
y = "Count",
title = "Per-rat mean reward rate — Experiment 1"
)
Distribution of per-rat mean reward rates in Experiment 1. The dashed line marks chance performance (0.5).
Research question: Do rats in Experiment 1 earn a higher mean reward rate than rats in Experiment 2?
Here the two groups are entirely separate sets of rats — there is no pairing between them. This calls for an independent samples t-test.
d1 <- fread("data/experiment_1_summary.csv")
d2 <- fread("data/experiment_2_summary.csv")
d_both <- rbind(d1, d2)
d_both_rat <- d_both[, .(mean_reward = mean(reward)), .(rat_id, experiment)]
head(d_both_rat)
## rat_id experiment mean_reward
## <int> <char> <num>
## 1: 1 exp1 0.585
## 2: 2 exp1 0.520
## 3: 3 exp1 0.520
## 4: 4 exp1 0.505
## 5: 5 exp1 0.535
## 6: 6 exp1 0.640
ggplot(d_both_rat, aes(x = factor(experiment), y = mean_reward, fill = factor(experiment))) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.15, size = 2, alpha = 0.4) +
scale_fill_manual(values = COL) +
labs(
x = "Experiment",
y = "Mean reward rate",
title = "Per-rat mean reward rate by experiment",
fill = "Experiment"
) +
theme(legend.position = "none")
Mean reward rate by experiment. Points show individual rats; boxes show group medians and IQR.
t.test(mean_reward ~ experiment, data = d_both_rat)
##
## Welch Two Sample t-test
##
## data: mean_reward by experiment
## t = -0.54266, df = 45.993, p-value = 0.59
## alternative hypothesis: true difference in means between group exp1 and group exp2 is not equal to 0
## 95 percent confidence interval:
## -0.04022532 0.02314199
## sample estimates:
## mean in group exp1 mean in group exp2
## 0.5177083 0.5262500
t.test(mean_reward ~ experiment, data = d_both_rat, var.equal = TRUE)
##
## Two Sample t-test
##
## data: mean_reward by experiment
## t = -0.54266, df = 46, p-value = 0.59
## alternative hypothesis: true difference in means between group exp1 and group exp2 is not equal to 0
## 95 percent confidence interval:
## -0.04022520 0.02314186
## sample estimates:
## mean in group exp1 mean in group exp2
## 0.5177083 0.5262500
var.equal = FALSE, the
R default) does not assume the two groups have equal variances.
It adjusts the degrees of freedom accordingly. This is almost always the
safer choice, especially when sample sizes or spreads differ between
groups.var.equal = TRUE)
assumes equal variances. It can be slightly more powerful when that
assumption holds, but the performance cost of using Welch’s when
variances are equal is negligible. For this reason, Welch’s is
generally recommended.Compare the two experiment means, the t-statistic, degrees of freedom, p-value, and the 95% CI for the difference. If the CI excludes zero, the two groups differ reliably. Note the direction: which experiment has the higher reward rate, and by how much?
Research question: Within Experiment 1, do rats earn
more reward in the mesh cage context than in the
wood cage context?
Each rat in Experiment 1 experienced both cage contexts, so
each rat contributes one mean for mesh and one mean for
wood. Because the two values come from the same
rat, they are not independent — pairing on rat accounts for the fact
that some rats are simply better learners than others.
d_paired <- d1[, .(mean_reward = mean(reward)), .(rat_id, cage_context)]
head(d_paired)
## rat_id cage_context mean_reward
## <int> <char> <num>
## 1: 1 mesh 0.6428571
## 2: 1 wood 0.5294118
## 3: 2 wood 0.5222222
## 4: 2 mesh 0.5181818
## 5: 3 mesh 0.4411765
## 6: 3 wood 0.6020408
d_wide <- dcast(d_paired, rat_id ~ cage_context, value.var = "mean_reward")
head(d_wide)
## Key: <rat_id>
## rat_id mesh wood
## <int> <num> <num>
## 1: 1 0.6428571 0.5294118
## 2: 2 0.5181818 0.5222222
## 3: 3 0.4411765 0.6020408
## 4: 4 0.4901961 0.5204082
## 5: 5 0.5894737 0.4857143
## 6: 6 0.6938776 0.5882353
t.test(d_wide$mesh, d_wide$wood, paired = TRUE)
##
## Paired t-test
##
## data: d_wide$mesh and d_wide$wood
## t = 1.8233, df = 23, p-value = 0.08129
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.00359283 0.05698528
## sample estimates:
## mean difference
## 0.02669622
The paired t-test works on the differences within each rat rather than on the raw scores. By subtracting each rat’s wood score from its mesh score, between-rat variability is removed entirely. If rats differ a lot from each other (some are fast learners, some are slow), that variability inflates the error term in an unpaired test, making it harder to detect a real context effect. The paired test sidesteps this by focusing only on the change within each rat.
ggplot(d_paired, aes(x = cage_context, y = mean_reward, group = rat_id)) +
geom_line(alpha = 0.5, colour = "grey60") +
geom_point(size = 2, aes(colour = cage_context)) +
scale_colour_manual(values = COL) +
stat_summary(aes(group = 1), fun = mean, geom = "line",
linewidth = 1.5, colour = "black", linetype = "dashed") +
labs(
x = "Cage context",
y = "Mean reward rate",
title = "Per-rat mean reward rate by cage context — Experiment 1",
colour = "Context"
)
Each line connects a single rat’s mean reward rate in the mesh and wood contexts. Lines sloping upward indicate higher reward in mesh than wood.
Inspect the spaghetti plot: if most lines slope in the same direction, that is a visual signal of a consistent context effect. Confirm with the t-test output — does the CI for the mean difference exclude zero? Is the p-value below your chosen threshold?
All three t-test variants share a core set of assumptions. Violating them can inflate the false positive rate or reduce power.
The t-test assumes the sampling distribution of the mean is approximately normal. In practice this means either:
Check normality with a histogram and a Q-Q plot:
ggplot(d_rat, aes(x = mean_reward)) +
geom_histogram(bins = 20, fill = COL[1], colour = "white") +
labs(x = "Mean reward rate", title = "Distribution of per-rat mean reward")
ggplot(d_rat, aes(sample = mean_reward)) +
stat_qq(colour = COL[1]) +
stat_qq_line(colour = COL[2]) +
labs(x = "Theoretical quantiles", y = "Sample quantiles", title = "Q-Q plot — per-rat mean reward")
Q-Q plot for per-rat mean reward rates in Experiment 1. Points falling close to the line suggest approximate normality.
Each observation (here, each rat) must be independent of the others. Rats that share cages and influence each other’s behaviour would violate this. If independence is in doubt, the t-test p-values cannot be trusted.
The independent samples t-test (Student’s version) assumes both groups have equal population variances. Welch’s t-test relaxes this assumption by adjusting the degrees of freedom — this is why Welch’s is generally preferred. For one-sample and paired t-tests, homogeneity of variance is not relevant because there is effectively only one set of values being tested.
When you apply t-tests to your assigned dataset, briefly state which assumptions you checked and what you found. You do not need to run formal tests for every assumption, but you should inspect a histogram or Q-Q plot and comment on whether the data look approximately normal and whether independence is reasonable given the study design.
Your assigned dataset is determined by your student ID number. Take
the last digit and compute last_digit %% 3 in R.
0: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_switch1: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_auto2: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_unlearnLoad your assigned dataset and run at least two of the following:
For each test you run: