# load libraries
library(data.table)
library(ggplot2)
# clean work space
rm(list = ls())
# init colorscheme
COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
theme_minimal(base_size = 13) +
theme(
panel.grid.minor = element_blank(),
strip.text = element_text(face = "bold"),
legend.position = "bottom"
)
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))
The t-test is one of the most widely used inferential tools in statistics. There are three main variants, and choosing the right one depends entirely on the structure of your data and your research question:
One-sample t-test: Tests whether a single group’s mean differs from a known or hypothesised reference value (e.g., chance performance).
Independent samples t-test: Tests whether two separate, unrelated groups have different means.
Paired t-test: Tests whether two measurements taken on the same subjects (or matched pairs) differ on average.
In this tutorial we work through each variant using rat T-maze data. By the end you should be able to choose the appropriate t-test for a given design, run it in R, and interpret its output.
Research question: Do rats in Experiment 1 perform above chance (50% reward rate)?
In a T-maze with two arms, a rat choosing randomly would earn a reward on roughly 50% of trials.
If rats have learned the task, their mean reward rate should be
greater than 0.5. We can test this with a one-sample t-test
against mu = 0.5.
Because the hypothesis is directional, this is a one-sided test. A two-sided test would ask whether performance differs from 0.5 in either direction, but here the theory predicts performance specifically above chance.
d <- fread("data/experiment_1_summary.csv")
head(d)
## experiment rat_id trial cage_context scent choice reward reaction_time maze_run_time
## <char> <int> <int> <char> <char> <char> <int> <num> <num>
## 1: exp1 1 1 mesh none L 1 0.5716877 3.789606
## 2: exp1 1 2 wood none R 0 0.9360044 3.209561
## 3: exp1 1 3 mesh none L 1 0.4574906 2.137541
## 4: exp1 1 4 mesh peppermint L 1 0.4461169 2.648328
## 5: exp1 1 5 mesh lemon L 1 0.2000000 2.727809
## 6: exp1 1 6 mesh peppermint L 1 0.6402233 2.699873
d_rat <- d[, .(mean_reward = mean(reward)), .(rat_id)]
d_rat
## rat_id mean_reward
## <int> <num>
## 1: 1 0.585
## 2: 2 0.520
## 3: 3 0.520
## 4: 4 0.505
## 5: 5 0.535
## 6: 6 0.640
## 7: 7 0.455
## 8: 8 0.395
## 9: 9 0.545
## 10: 10 0.545
## 11: 11 0.515
## 12: 12 0.485
## 13: 13 0.580
## 14: 14 0.530
## 15: 15 0.565
## 16: 16 0.555
## 17: 17 0.490
## 18: 18 0.525
## 19: 19 0.540
## 20: 20 0.455
## 21: 21 0.445
## 22: 22 0.445
## 23: 23 0.490
## 24: 24 0.560
## rat_id mean_reward
t.test(d_rat[["mean_reward"]], mu = 0.5, alternative = "greater")
##
## One Sample t-test
##
## data: d_rat[["mean_reward"]]
## t = 1.6008, df = 23, p-value = 0.06153
## alternative hypothesis: true mean is greater than 0.5
## 95 percent confidence interval:
## 0.4987492 Inf
## sample estimates:
## mean of x
## 0.5177083
t: The test statistic — how many standard errors
the sample mean is from mu = 0.5.
df: Degrees of freedom (here, n − 1 = 23).
p-value: The probability of observing a t-statistic this large (or larger) if the true mean were exactly 0.5. A small p-value is evidence against the null in the predicted direction.
95% confidence interval: A range of plausible values for the true population mean. For a one-sided test, this should be interpreted alongside the directional hypothesis that the mean is above 0.5.
sample mean: The observed average reward rate across the 24 rats.
If the p-value is below 0.05, we have evidence that rats perform above chance. The p-value addresses evidence against the null hypothesis, the confidence interval shows the precision of the estimate, and the distance between the sample mean and 0.5 describes the size of the effect.
In this dataset, the sample mean is only slightly above 0.5, the p-value is not below 0.05, and the confidence interval includes 0.5. So the sensible conclusion is not that rats clearly performed above chance, but that this sample does not provide strong evidence of above-chance performance.
ggplot(d_rat, aes(x = mean_reward)) +
geom_histogram(binwidth = 0.02, fill = COL[1], colour = "white") +
geom_vline(xintercept = 0.5, linetype = "dashed", colour = "firebrick", linewidth = 1) +
labs(
x = "Mean reward rate",
y = "Count",
title = "Per-rat mean reward rate - Experiment 1"
)
Distribution of per-rat mean reward rates in Experiment 1. The dashed line marks chance performance (0.5).
This figure helps explain the non-significant t-test result. Many rats have mean reward rates close to the chance value of 0.5, and the full distribution is centred only slightly above that line. Visually, that makes it plausible that the sample mean could differ from 0.5 just because of sampling noise.
Research question: Do rats in Experiment 1 earn a higher mean reward rate than rats in Experiment 2?
Here the two groups are entirely separate sets of rats — there is no pairing between them. This calls for an independent samples t-test.
d1 <- fread("data/experiment_1_summary.csv")
d2 <- fread("data/experiment_2_summary.csv")
d_both <- rbind(d1, d2)
d_both_rat <- d_both[, .(mean_reward = mean(reward)), .(rat_id, experiment)]
head(d_both_rat)
## rat_id experiment mean_reward
## <int> <char> <num>
## 1: 1 exp1 0.585
## 2: 2 exp1 0.520
## 3: 3 exp1 0.520
## 4: 4 exp1 0.505
## 5: 5 exp1 0.535
## 6: 6 exp1 0.640
ggplot(d_both_rat, aes(x = factor(experiment), y = mean_reward, fill = factor(experiment))) +
geom_boxplot(outlier.shape = NA) +
geom_jitter(width = 0.15, size = 2, alpha = 0.4) +
scale_fill_manual(values = COL) +
labs(
x = "Experiment",
y = "Mean reward rate",
title = "Per-rat mean reward rate by experiment",
fill = "Experiment"
) +
theme(legend.position = "none")
Mean reward rate by experiment. Points show individual rats; boxes show group medians and IQR.
Here we use the formula form of t.test, which is a
compact way to compare two groups stored in a data frame. It is
equivalent to extracting the two group vectors manually.
t.test(mean_reward ~ experiment, data = d_both_rat)
##
## Welch Two Sample t-test
##
## data: mean_reward by experiment
## t = -0.54266, df = 45.993, p-value = 0.59
## alternative hypothesis: true difference in means between group exp1 and group exp2 is not equal to 0
## 95 percent confidence interval:
## -0.04022532 0.02314199
## sample estimates:
## mean in group exp1 mean in group exp2
## 0.5177083 0.5262500
The same test can be written in the more familiar vector form by extracting the two groups directly:
t.test(
d_both_rat[experiment == "exp1", mean_reward],
d_both_rat[experiment == "exp2", mean_reward]
)
##
## Welch Two Sample t-test
##
## data: d_both_rat[experiment == "exp1", mean_reward] and d_both_rat[experiment == "exp2", mean_reward]
## t = -0.54266, df = 45.993, p-value = 0.59
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.04022532 0.02314199
## sample estimates:
## mean of x mean of y
## 0.5177083 0.5262500
These two versions are doing the same thing. In the formula version,
mean_reward ~ experiment means “compare the variable
mean_reward across the groups defined by
experiment”. In the vector version, we manually provide the
two sets of values to be compared.
t.test(mean_reward ~ experiment, data = d_both_rat, var.equal = TRUE)
##
## Two Sample t-test
##
## data: mean_reward by experiment
## t = -0.54266, df = 46, p-value = 0.59
## alternative hypothesis: true difference in means between group exp1 and group exp2 is not equal to 0
## 95 percent confidence interval:
## -0.04022520 0.02314186
## sample estimates:
## mean in group exp1 mean in group exp2
## 0.5177083 0.5262500
Welch’s t-test (var.equal = FALSE,
the R default) does not assume the two groups have equal
variances. It adjusts the degrees of freedom accordingly. This is almost
always the safer choice, especially when sample sizes or spreads differ
between groups.
Student’s t-test (var.equal = TRUE)
assumes equal variances. It can be slightly more powerful when that
assumption holds, but the performance cost of using Welch’s when
variances are equal is negligible. For this reason, Welch’s is
generally recommended.
Compare the two experiment means, the t-statistic, degrees of freedom, p-value, and the 95% CI for the difference. If the CI excludes zero, the two groups differ reliably. Note the direction: which experiment has the higher reward rate, and by how much?
In this dataset, the group means are very similar, the p-value is large, and the confidence interval for the mean difference includes zero. The boxplot tells the same story: the two distributions overlap heavily, so there is little visual reason to expect a clear between-experiment effect.
Research question: Within Experiment 1, do rats earn
more reward in the mesh cage context than in the
wood cage context?
Each rat in Experiment 1 experienced both cage contexts, so
each rat contributes one mean for mesh and one mean for
wood. Because the two values come from the same
rat, they are not independent — pairing on rat accounts for the fact
that some rats are simply better learners than others.
d_paired <- d1[, .(mean_reward = mean(reward)), .(rat_id, cage_context)]
head(d_paired)
## rat_id cage_context mean_reward
## <int> <char> <num>
## 1: 1 mesh 0.6428571
## 2: 1 wood 0.5294118
## 3: 2 wood 0.5222222
## 4: 2 mesh 0.5181818
## 5: 3 mesh 0.4411765
## 6: 3 wood 0.6020408
For a paired t-test, each rat’s two condition means must be lined up
side by side so R can compare the matched observations within the same
rat. Converting to wide format gives one row per rat, with separate
columns for mesh and wood.
d_wide <- dcast(d_paired, rat_id ~ cage_context, value.var = "mean_reward")
head(d_wide)
## Key: <rat_id>
## rat_id mesh wood
## <int> <num> <num>
## 1: 1 0.6428571 0.5294118
## 2: 2 0.5181818 0.5222222
## 3: 3 0.4411765 0.6020408
## 4: 4 0.4901961 0.5204082
## 5: 5 0.5894737 0.4857143
## 6: 6 0.6938776 0.5882353
t.test(d_wide[["mesh"]], d_wide[["wood"]], paired = TRUE)
##
## Paired t-test
##
## data: d_wide[["mesh"]] and d_wide[["wood"]]
## t = 1.8233, df = 23, p-value = 0.08129
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.00359283 0.05698528
## sample estimates:
## mean difference
## 0.02669622
The paired t-test works on the differences within each rat rather than on the raw scores. By subtracting each rat’s wood score from its mesh score, between-rat variability is removed entirely. If rats differ a lot from each other (some are fast learners, some are slow), that variability inflates the error term in an unpaired test, making it harder to detect a real context effect. The paired test sidesteps this by focusing only on the change within each rat.
ggplot(d_paired, aes(x = cage_context, y = mean_reward, group = rat_id)) +
geom_line(alpha = 0.5, colour = "grey60") +
geom_point(size = 2, aes(colour = cage_context)) +
scale_colour_manual(values = COL) +
stat_summary(aes(group = 1), fun = mean, geom = "line",
linewidth = 1.5, colour = "black", linetype = "dashed") +
labs(
x = "Cage context",
y = "Mean reward rate",
title = "Per-rat mean reward rate by cage context — Experiment 1",
colour = "Context"
)
Each line connects a single rat’s mean reward rate in the mesh and wood contexts. Lines sloping upward indicate higher reward in mesh than wood.
Inspect the spaghetti plot: if most lines slope in the same direction, that is a visual signal of a consistent context effect. Confirm with the t-test output — does the CI for the mean difference exclude zero? Is the p-value below your chosen threshold?
Here the plot shows a mixed pattern: some rats perform better in
mesh, some in wood, and many differences are
small. That matches the paired t-test output, where the p-value is above
0.05 and the confidence interval for the mean difference includes zero.
The visual and inferential results therefore point to the same cautious
conclusion.
All three t-test variants share a core set of assumptions. Violating them can inflate the false positive rate or reduce power.
The t-test assumes the sampling distribution of the mean is approximately normal. In practice this means either:
The data themselves are roughly normally distributed (especially important for small samples), or
The sample size is large enough for the Central Limit Theorem to apply (often n ≥ 30 is cited as a rough guide, but this depends on how skewed the data are).
Check normality with a histogram and a Q-Q plot.
A histogram gives a rough visual sense of the distribution: is it fairly symmetric and mound-shaped, or is it strongly skewed?
A Q-Q plot gives a more direct comparison between your data and a normal distribution. “Q-Q” stands for quantile- quantile. The basic idea is:
R sorts your observed values from smallest to largest.
It then compares them to the values you would expect if the data came from a normal distribution.
Each point on the plot pairs one observed value with one expected normal value.
If the data are approximately normal, the points should fall roughly along a straight line. If they bend away from the line in a systematic way, that suggests the data may not be well described by a normal distribution.
You do not need to read a Q-Q plot mathematically. A simple rule of thumb is:
Points close to the line: the normality assumption looks reasonably plausible.
Strong curved patterns or points far from the line: the normality assumption may be questionable.
Some common patterns to look for are:
A pronounced S-shape, which can suggest skew or heavier tails than a normal distribution.
Points that follow the line in the middle but pull away at the ends, which can suggest unusually extreme values in the tails.
One or two points far from the rest, which can suggest possible outliers.
ggplot(d_rat, aes(x = mean_reward)) +
geom_histogram(bins = 20, fill = COL[1], colour = "white") +
labs(x = "Mean reward rate", title = "Distribution of per-rat mean reward")
ggplot(d_rat, aes(sample = mean_reward)) +
stat_qq(colour = COL[1]) +
stat_qq_line(colour = COL[2]) +
labs(x = "Theoretical quantiles", y = "Sample quantiles", title = "Q-Q plot — per-rat mean reward")
Q-Q plot for per-rat mean reward rates in Experiment 1. Points falling close to the line suggest approximate normality.
These plots are not testing the main hypothesis. They are helping you judge whether the one-sample t-test is a reasonable tool for these data. If the histogram is very skewed or the Q-Q plot bends strongly away from the line, then the normality assumption may be questionable.
In a Q-Q plot, points close to the reference line suggest that the data are approximately normal. Large systematic departures from the line suggest that the normality assumption may be less plausible.
In this example, the points lie fairly close to the line and the histogram does not show severe skew. That does not prove the data are perfectly normal, but it does suggest that the one-sample t-test is a reasonable approximation here.
Each observation (here, each rat) must be independent of the others. Rats that share cages and influence each other’s behaviour would violate this. If independence is in doubt, the t-test p-values cannot be trusted.
The independent samples t-test (Student’s version) assumes both groups have equal population variances. Welch’s t-test relaxes this assumption by adjusting the degrees of freedom — this is why Welch’s is generally preferred. For one-sample and paired t-tests, homogeneity of variance is not relevant because there is effectively only one set of values being tested.
When you apply t-tests to your assigned dataset, briefly state which assumptions you checked and what you found. You do not need to run formal tests for every assumption, but you should inspect a histogram or Q-Q plot and comment on whether the data look approximately normal and whether independence is reasonable given the study design.
Your assigned dataset is determined by your student ID number. Take
the last digit and compute last_digit %% 3 in R.
If the result is 0: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_switch
If the result is 1: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_auto
If the result is 2: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_unlearn
Load your assigned dataset and run at least two of the following:
A one-sample t-test asking whether a key variable differs from a meaningful reference value (e.g., chance performance).
An independent samples t-test comparing two groups or conditions.
A paired t-test comparing two within-subject conditions (if your design has them).
For each test you run:
State the research question clearly.
Show the code and output.
Write 2–3 sentences interpreting the result, including the direction of the effect, the p-value, and the 95% confidence interval.
Briefly note which assumptions you checked and whether they appeared to be met.