Tutorial 8 - t-tests: one-sample, independent, and paired

# load libraries

library(data.table)
library(ggplot2)

# clean work space

rm(list = ls())

# init colorscheme

COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
  theme_minimal(base_size = 13) +
    theme(
      panel.grid.minor = element_blank(),
      strip.text = element_text(face = "bold"),
      legend.position = "bottom"
    )
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))

Overview

The t-test is one of the most widely used inferential tools in statistics. There are three main variants, and choosing the right one depends entirely on the structure of your data and your research question:

One-sample t-test: Tests whether a single group’s mean differs from a known or hypothesised reference value (e.g., chance performance).
Independent samples t-test: Tests whether two separate, unrelated groups have different means.
Paired t-test: Tests whether two measurements taken on the same subjects (or matched pairs) differ on average.

In this tutorial we work through each variant using rat T-maze data. By the end you should be able to choose the appropriate t-test for a given design, run it in R, and interpret its output.

Part 1 — One-sample t-test

Research question: Do rats in Experiment 1 perform above chance (50% reward rate)?

In a T-maze with two arms, a rat choosing randomly would earn a reward on roughly 50% of trials.

If rats have learned the task, their mean reward rate should be greater than 0.5. We can test this with a one-sample t-test against mu = 0.5.

Because the hypothesis is directional, this is a one-sided test. A two-sided test would ask whether performance differs from 0.5 in either direction, but here the theory predicts performance specifically above chance.

Load and summarise the data

d <- fread("data/experiment_1_summary.csv")
head(d)
##    experiment rat_id trial cage_context      scent choice reward reaction_time maze_run_time
##        <char>  <int> <int>       <char>     <char> <char>  <int>         <num>         <num>
## 1:       exp1      1     1         mesh       none      L      1     0.5716877      3.789606
## 2:       exp1      1     2         wood       none      R      0     0.9360044      3.209561
## 3:       exp1      1     3         mesh       none      L      1     0.4574906      2.137541
## 4:       exp1      1     4         mesh peppermint      L      1     0.4461169      2.648328
## 5:       exp1      1     5         mesh      lemon      L      1     0.2000000      2.727809
## 6:       exp1      1     6         mesh peppermint      L      1     0.6402233      2.699873

d_rat <- d[, .(mean_reward = mean(reward)), .(rat_id)]
d_rat
##     rat_id mean_reward
##      <int>       <num>
##  1:      1       0.585
##  2:      2       0.520
##  3:      3       0.520
##  4:      4       0.505
##  5:      5       0.535
##  6:      6       0.640
##  7:      7       0.455
##  8:      8       0.395
##  9:      9       0.545
## 10:     10       0.545
## 11:     11       0.515
## 12:     12       0.485
## 13:     13       0.580
## 14:     14       0.530
## 15:     15       0.565
## 16:     16       0.555
## 17:     17       0.490
## 18:     18       0.525
## 19:     19       0.540
## 20:     20       0.455
## 21:     21       0.445
## 22:     22       0.445
## 23:     23       0.490
## 24:     24       0.560
##     rat_id mean_reward

Run the one-sample t-test

t.test(d_rat[["mean_reward"]], mu = 0.5, alternative = "greater")
## 
##  One Sample t-test
## 
## data:  d_rat[["mean_reward"]]
## t = 1.6008, df = 23, p-value = 0.06153
## alternative hypothesis: true mean is greater than 0.5
## 95 percent confidence interval:
##  0.4987492       Inf
## sample estimates:
## mean of x 
## 0.5177083

Understanding the output

t: The test statistic — how many standard errors the sample mean is from mu = 0.5.
df: Degrees of freedom (here, n − 1 = 23).
p-value: The probability of observing a t-statistic this large (or larger) if the true mean were exactly 0.5. A small p-value is evidence against the null in the predicted direction.
95% confidence interval: A range of plausible values for the true population mean. For a one-sided test, this should be interpreted alongside the directional hypothesis that the mean is above 0.5.
sample mean: The observed average reward rate across the 24 rats.

Interpretation

If the p-value is below 0.05, we have evidence that rats perform above chance. The p-value addresses evidence against the null hypothesis, the confidence interval shows the precision of the estimate, and the distance between the sample mean and 0.5 describes the size of the effect.

In this dataset, the sample mean is only slightly above 0.5, the p-value is not below 0.05, and the confidence interval includes 0.5. So the sensible conclusion is not that rats clearly performed above chance, but that this sample does not provide strong evidence of above-chance performance.

Visualise

ggplot(d_rat, aes(x = mean_reward)) +
  geom_histogram(binwidth = 0.02, fill = COL[1], colour = "white") +
  geom_vline(xintercept = 0.5, linetype = "dashed", colour = "firebrick", linewidth = 1) +
  labs(
    x = "Mean reward rate",
    y = "Count",
    title = "Per-rat mean reward rate - Experiment 1"
  )

Distribution of per-rat mean reward rates in Experiment 1. The dashed line marks chance performance (0.5).

This figure helps explain the non-significant t-test result. Many rats have mean reward rates close to the chance value of 0.5, and the full distribution is centred only slightly above that line. Visually, that makes it plausible that the sample mean could differ from 0.5 just because of sampling noise.

Part 2 — Independent samples t-test

Research question: Do rats in Experiment 1 earn a higher mean reward rate than rats in Experiment 2?

Here the two groups are entirely separate sets of rats — there is no pairing between them. This calls for an independent samples t-test.

Load and combine both datasets

d1 <- fread("data/experiment_1_summary.csv")
d2 <- fread("data/experiment_2_summary.csv")
d_both <- rbind(d1, d2)

d_both_rat <- d_both[, .(mean_reward = mean(reward)), .(rat_id, experiment)]
head(d_both_rat)
##    rat_id experiment mean_reward
##     <int>     <char>       <num>
## 1:      1       exp1       0.585
## 2:      2       exp1       0.520
## 3:      3       exp1       0.520
## 4:      4       exp1       0.505
## 5:      5       exp1       0.535
## 6:      6       exp1       0.640

Visualise

ggplot(d_both_rat, aes(x = factor(experiment), y = mean_reward, fill = factor(experiment))) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.15, size = 2, alpha = 0.4) +
  scale_fill_manual(values = COL) +
  labs(
    x = "Experiment",
    y = "Mean reward rate",
    title = "Per-rat mean reward rate by experiment",
    fill = "Experiment"
  ) +
  theme(legend.position = "none")

Mean reward rate by experiment. Points show individual rats; boxes show group medians and IQR.

Welch’s t-test (default — unequal variances assumed)

Here we use the formula form of t.test, which is a compact way to compare two groups stored in a data frame. It is equivalent to extracting the two group vectors manually.

t.test(mean_reward ~ experiment, data = d_both_rat)
## 
##  Welch Two Sample t-test
## 
## data:  mean_reward by experiment
## t = -0.54266, df = 45.993, p-value = 0.59
## alternative hypothesis: true difference in means between group exp1 and group exp2 is not equal to 0
## 95 percent confidence interval:
##  -0.04022532  0.02314199
## sample estimates:
## mean in group exp1 mean in group exp2 
##          0.5177083          0.5262500

The same test can be written in the more familiar vector form by extracting the two groups directly:

t.test(
  d_both_rat[experiment == "exp1", mean_reward],
  d_both_rat[experiment == "exp2", mean_reward]
)
## 
##  Welch Two Sample t-test
## 
## data:  d_both_rat[experiment == "exp1", mean_reward] and d_both_rat[experiment == "exp2", mean_reward]
## t = -0.54266, df = 45.993, p-value = 0.59
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.04022532  0.02314199
## sample estimates:
## mean of x mean of y 
## 0.5177083 0.5262500

These two versions are doing the same thing. In the formula version, mean_reward ~ experiment means “compare the variable mean_reward across the groups defined by experiment”. In the vector version, we manually provide the two sets of values to be compared.

Student’s t-test (equal variances assumed)

t.test(mean_reward ~ experiment, data = d_both_rat, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  mean_reward by experiment
## t = -0.54266, df = 46, p-value = 0.59
## alternative hypothesis: true difference in means between group exp1 and group exp2 is not equal to 0
## 95 percent confidence interval:
##  -0.04022520  0.02314186
## sample estimates:
## mean in group exp1 mean in group exp2 
##          0.5177083          0.5262500

When to use which?

Welch’s t-test (var.equal = FALSE, the R default) does not assume the two groups have equal variances. It adjusts the degrees of freedom accordingly. This is almost always the safer choice, especially when sample sizes or spreads differ between groups.
Student’s t-test (var.equal = TRUE) assumes equal variances. It can be slightly more powerful when that assumption holds, but the performance cost of using Welch’s when variances are equal is negligible. For this reason, Welch’s is generally recommended.

Interpretation

Compare the two experiment means, the t-statistic, degrees of freedom, p-value, and the 95% CI for the difference. If the CI excludes zero, the two groups differ reliably. Note the direction: which experiment has the higher reward rate, and by how much?

In this dataset, the group means are very similar, the p-value is large, and the confidence interval for the mean difference includes zero. The boxplot tells the same story: the two distributions overlap heavily, so there is little visual reason to expect a clear between-experiment effect.

Part 3 — Paired t-test

Research question: Within Experiment 1, do rats earn more reward in the mesh cage context than in the wood cage context?

Each rat in Experiment 1 experienced both cage contexts, so each rat contributes one mean for mesh and one mean for wood. Because the two values come from the same rat, they are not independent — pairing on rat accounts for the fact that some rats are simply better learners than others.

Compute per-rat, per-context means

d_paired <- d1[, .(mean_reward = mean(reward)), .(rat_id, cage_context)]
head(d_paired)
##    rat_id cage_context mean_reward
##     <int>       <char>       <num>
## 1:      1         mesh   0.6428571
## 2:      1         wood   0.5294118
## 3:      2         wood   0.5222222
## 4:      2         mesh   0.5181818
## 5:      3         mesh   0.4411765
## 6:      3         wood   0.6020408

Convert to wide format

For a paired t-test, each rat’s two condition means must be lined up side by side so R can compare the matched observations within the same rat. Converting to wide format gives one row per rat, with separate columns for mesh and wood.

d_wide <- dcast(d_paired, rat_id ~ cage_context, value.var = "mean_reward")
head(d_wide)
## Key: <rat_id>
##    rat_id      mesh      wood
##     <int>     <num>     <num>
## 1:      1 0.6428571 0.5294118
## 2:      2 0.5181818 0.5222222
## 3:      3 0.4411765 0.6020408
## 4:      4 0.4901961 0.5204082
## 5:      5 0.5894737 0.4857143
## 6:      6 0.6938776 0.5882353

Run the paired t-test

t.test(d_wide[["mesh"]], d_wide[["wood"]], paired = TRUE)
## 
##  Paired t-test
## 
## data:  d_wide[["mesh"]] and d_wide[["wood"]]
## t = 1.8233, df = 23, p-value = 0.08129
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.00359283  0.05698528
## sample estimates:
## mean difference 
##      0.02669622

Why pairing increases sensitivity

The paired t-test works on the differences within each rat rather than on the raw scores. By subtracting each rat’s wood score from its mesh score, between-rat variability is removed entirely. If rats differ a lot from each other (some are fast learners, some are slow), that variability inflates the error term in an unpaired test, making it harder to detect a real context effect. The paired test sidesteps this by focusing only on the change within each rat.

Visualise — spaghetti plot

ggplot(d_paired, aes(x = cage_context, y = mean_reward, group = rat_id)) +
  geom_line(alpha = 0.5, colour = "grey60") +
  geom_point(size = 2, aes(colour = cage_context)) +
  scale_colour_manual(values = COL) +
  stat_summary(aes(group = 1), fun = mean, geom = "line",
               linewidth = 1.5, colour = "black", linetype = "dashed") +
  labs(
    x = "Cage context",
    y = "Mean reward rate",
    title = "Per-rat mean reward rate by cage context — Experiment 1",
    colour = "Context"
  )

Each line connects a single rat’s mean reward rate in the mesh and wood contexts. Lines sloping upward indicate higher reward in mesh than wood.

Interpretation

Inspect the spaghetti plot: if most lines slope in the same direction, that is a visual signal of a consistent context effect. Confirm with the t-test output — does the CI for the mean difference exclude zero? Is the p-value below your chosen threshold?

Here the plot shows a mixed pattern: some rats perform better in mesh, some in wood, and many differences are small. That matches the paired t-test output, where the p-value is above 0.05 and the confidence interval for the mean difference includes zero. The visual and inferential results therefore point to the same cautious conclusion.

Part 4 — Assumptions

All three t-test variants share a core set of assumptions. Violating them can inflate the false positive rate or reduce power.

Normality

The t-test assumes the sampling distribution of the mean is approximately normal. In practice this means either:

The data themselves are roughly normally distributed (especially important for small samples), or
The sample size is large enough for the Central Limit Theorem to apply (often n ≥ 30 is cited as a rough guide, but this depends on how skewed the data are).

Check normality with a histogram and a Q-Q plot.

A histogram gives a rough visual sense of the distribution: is it fairly symmetric and mound-shaped, or is it strongly skewed?

A Q-Q plot gives a more direct comparison between your data and a normal distribution. “Q-Q” stands for quantile- quantile. The basic idea is:

R sorts your observed values from smallest to largest.
It then compares them to the values you would expect if the data came from a normal distribution.
Each point on the plot pairs one observed value with one expected normal value.

If the data are approximately normal, the points should fall roughly along a straight line. If they bend away from the line in a systematic way, that suggests the data may not be well described by a normal distribution.

You do not need to read a Q-Q plot mathematically. A simple rule of thumb is:

Points close to the line: the normality assumption looks reasonably plausible.
Strong curved patterns or points far from the line: the normality assumption may be questionable.

Some common patterns to look for are:

A pronounced S-shape, which can suggest skew or heavier tails than a normal distribution.
Points that follow the line in the middle but pull away at the ends, which can suggest unusually extreme values in the tails.
One or two points far from the rest, which can suggest possible outliers.

ggplot(d_rat, aes(x = mean_reward)) +
  geom_histogram(bins = 20, fill = COL[1], colour = "white") +
  labs(x = "Mean reward rate", title = "Distribution of per-rat mean reward")
ggplot(d_rat, aes(sample = mean_reward)) +
  stat_qq(colour = COL[1]) +
  stat_qq_line(colour = COL[2]) +
  labs(x = "Theoretical quantiles", y = "Sample quantiles", title = "Q-Q plot — per-rat mean reward")

Q-Q plot for per-rat mean reward rates in Experiment 1. Points falling close to the line suggest approximate normality.

These plots are not testing the main hypothesis. They are helping you judge whether the one-sample t-test is a reasonable tool for these data. If the histogram is very skewed or the Q-Q plot bends strongly away from the line, then the normality assumption may be questionable.

In a Q-Q plot, points close to the reference line suggest that the data are approximately normal. Large systematic departures from the line suggest that the normality assumption may be less plausible.

In this example, the points lie fairly close to the line and the histogram does not show severe skew. That does not prove the data are perfectly normal, but it does suggest that the one-sample t-test is a reasonable approximation here.

Independence of observations

Each observation (here, each rat) must be independent of the others. Rats that share cages and influence each other’s behaviour would violate this. If independence is in doubt, the t-test p-values cannot be trusted.

Homogeneity of variance

The independent samples t-test (Student’s version) assumes both groups have equal population variances. Welch’s t-test relaxes this assumption by adjusting the degrees of freedom — this is why Welch’s is generally preferred. For one-sample and paired t-tests, homogeneity of variance is not relevant because there is effectively only one set of values being tested.

A note for the final project

When you apply t-tests to your assigned dataset, briefly state which assumptions you checked and what you found. You do not need to run formal tests for every assumption, but you should inspect a histogram or Q-Q plot and comment on whether the data look approximately normal and whether independence is reasonable given the study design.

Apply this to your assigned dataset

Your assigned dataset is determined by your student ID number. Take the last digit and compute last_digit %% 3 in R.

If the result is 0: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_switch
If the result is 1: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_auto
If the result is 2: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_unlearn

Load your assigned dataset and run at least two of the following:

A one-sample t-test asking whether a key variable differs from a meaningful reference value (e.g., chance performance).
An independent samples t-test comparing two groups or conditions.
A paired t-test comparing two within-subject conditions (if your design has them).

For each test you run:

State the research question clearly.
Show the code and output.
Write 2–3 sentences interpreting the result, including the direction of the effect, the p-value, and the 95% confidence interval.
Briefly note which assumptions you checked and whether they appeared to be met.

Tutorial 8 - t-tests: one-sample, independent, and paired

Author: Matthew J. Crossley

25 April, 2026

Overview

Part 1 — One-sample t-test

Load and summarise the data

Run the one-sample t-test

Understanding the output

Interpretation

Visualise

Part 2 — Independent samples t-test

Load and combine both datasets

Visualise

Welch’s t-test (default — unequal variances assumed)

Student’s t-test (equal variances assumed)

When to use which?

Interpretation

Part 3 — Paired t-test

Compute per-rat, per-context means

Convert to wide format

Run the paired t-test

Why pairing increases sensitivity

Visualise — spaghetti plot

Interpretation

Part 4 — Assumptions

Normality

Independence of observations

Homogeneity of variance

A note for the final project

Apply this to your assigned dataset