# load libraries

library(data.table)
library(ggplot2)

# clean work space

rm(list = ls())

# init colorscheme

COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
  theme_minimal(base_size = 13) +
    theme(
      panel.grid.minor = element_blank(),
      strip.text = element_text(face = "bold"),
      legend.position = "bottom"
    )
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))

Overview

The t-test is one of the most widely used inferential tools in statistics. There are three main variants, and choosing the right one depends entirely on the structure of your data and your research question:

  • One-sample t-test: Tests whether a single group’s mean differs from a known or hypothesised reference value (e.g., chance performance).
  • Independent samples t-test: Tests whether two separate, unrelated groups have different means.
  • Paired t-test: Tests whether two measurements taken on the same subjects (or matched pairs) differ on average.

In this tutorial we work through each variant using rat T-maze data. By the end you should be able to choose the appropriate t-test for a given design, run it in R, and interpret its output.


Part 1 — One-sample t-test

Research question: Do rats in Experiment 1 perform above chance (50% reward rate)?

In a T-maze with two arms, a rat choosing randomly would earn a reward on roughly 50% of trials. If rats have learned the task, their mean reward rate should be greater than 0.5. We can test this with a one-sample t-test against mu = 0.5.

Load and summarise the data

d <- fread("data/experiment_1_summary.csv")
head(d)
##    experiment rat_id trial cage_context      scent choice reward reaction_time maze_run_time
##        <char>  <int> <int>       <char>     <char> <char>  <int>         <num>         <num>
## 1:       exp1      1     1         mesh       none      L      1     0.5716877      3.789606
## 2:       exp1      1     2         wood       none      R      0     0.9360044      3.209561
## 3:       exp1      1     3         mesh       none      L      1     0.4574906      2.137541
## 4:       exp1      1     4         mesh peppermint      L      1     0.4461169      2.648328
## 5:       exp1      1     5         mesh      lemon      L      1     0.2000000      2.727809
## 6:       exp1      1     6         mesh peppermint      L      1     0.6402233      2.699873
d_rat <- d[, .(mean_reward = mean(reward)), .(rat_id)]
d_rat
##     rat_id mean_reward
##      <int>       <num>
##  1:      1       0.585
##  2:      2       0.520
##  3:      3       0.520
##  4:      4       0.505
##  5:      5       0.535
##  6:      6       0.640
##  7:      7       0.455
##  8:      8       0.395
##  9:      9       0.545
## 10:     10       0.545
## 11:     11       0.515
## 12:     12       0.485
## 13:     13       0.580
## 14:     14       0.530
## 15:     15       0.565
## 16:     16       0.555
## 17:     17       0.490
## 18:     18       0.525
## 19:     19       0.540
## 20:     20       0.455
## 21:     21       0.445
## 22:     22       0.445
## 23:     23       0.490
## 24:     24       0.560
##     rat_id mean_reward

Run the one-sample t-test

t.test(d_rat$mean_reward, mu = 0.5)
## 
##  One Sample t-test
## 
## data:  d_rat$mean_reward
## t = 1.6008, df = 23, p-value = 0.1231
## alternative hypothesis: true mean is not equal to 0.5
## 95 percent confidence interval:
##  0.4948245 0.5405921
## sample estimates:
## mean of x 
## 0.5177083

Understanding the output

  • t: The test statistic — how many standard errors the sample mean is from mu = 0.5.
  • df: Degrees of freedom (here, n − 1 = 23).
  • p-value: The probability of observing a t-statistic this extreme (or more) if the true mean were exactly 0.5. A small p-value is evidence against the null.
  • 95% confidence interval: A range of plausible values for the true population mean. If this interval does not include 0.5, that is consistent with rejecting the null.
  • sample mean: The observed average reward rate across the 24 rats.

Interpretation

If the p-value is below 0.05 and the confidence interval lies entirely above 0.5, we have evidence that rats perform above chance. The direction of the effect (mean > 0.5) and the width of the CI both matter — a very narrow CI far above 0.5 would indicate a reliable, sizeable effect.

Visualise

ggplot(d_rat, aes(x = mean_reward)) +
  geom_histogram(binwidth = 0.02, fill = COL[1], colour = "white") +
  geom_vline(xintercept = 0.5, linetype = "dashed", colour = "firebrick", linewidth = 1) +
  labs(
    x = "Mean reward rate",
    y = "Count",
    title = "Per-rat mean reward rate — Experiment 1"
  )
Distribution of per-rat mean reward rates in Experiment 1. The dashed line marks chance performance (0.5).

Distribution of per-rat mean reward rates in Experiment 1. The dashed line marks chance performance (0.5).


Part 2 — Independent samples t-test

Research question: Do rats in Experiment 1 earn a higher mean reward rate than rats in Experiment 2?

Here the two groups are entirely separate sets of rats — there is no pairing between them. This calls for an independent samples t-test.

Load and combine both datasets

d1 <- fread("data/experiment_1_summary.csv")
d2 <- fread("data/experiment_2_summary.csv")
d_both <- rbind(d1, d2)
d_both_rat <- d_both[, .(mean_reward = mean(reward)), .(rat_id, experiment)]
head(d_both_rat)
##    rat_id experiment mean_reward
##     <int>     <char>       <num>
## 1:      1       exp1       0.585
## 2:      2       exp1       0.520
## 3:      3       exp1       0.520
## 4:      4       exp1       0.505
## 5:      5       exp1       0.535
## 6:      6       exp1       0.640

Visualise

ggplot(d_both_rat, aes(x = factor(experiment), y = mean_reward, fill = factor(experiment))) +
  geom_boxplot(outlier.shape = NA) +
  geom_jitter(width = 0.15, size = 2, alpha = 0.4) +
  scale_fill_manual(values = COL) +
  labs(
    x = "Experiment",
    y = "Mean reward rate",
    title = "Per-rat mean reward rate by experiment",
    fill = "Experiment"
  ) +
  theme(legend.position = "none")
Mean reward rate by experiment. Points show individual rats; boxes show group medians and IQR.

Mean reward rate by experiment. Points show individual rats; boxes show group medians and IQR.

Welch’s t-test (default — unequal variances assumed)

t.test(mean_reward ~ experiment, data = d_both_rat)
## 
##  Welch Two Sample t-test
## 
## data:  mean_reward by experiment
## t = -0.54266, df = 45.993, p-value = 0.59
## alternative hypothesis: true difference in means between group exp1 and group exp2 is not equal to 0
## 95 percent confidence interval:
##  -0.04022532  0.02314199
## sample estimates:
## mean in group exp1 mean in group exp2 
##          0.5177083          0.5262500

Student’s t-test (equal variances assumed)

t.test(mean_reward ~ experiment, data = d_both_rat, var.equal = TRUE)
## 
##  Two Sample t-test
## 
## data:  mean_reward by experiment
## t = -0.54266, df = 46, p-value = 0.59
## alternative hypothesis: true difference in means between group exp1 and group exp2 is not equal to 0
## 95 percent confidence interval:
##  -0.04022520  0.02314186
## sample estimates:
## mean in group exp1 mean in group exp2 
##          0.5177083          0.5262500

When to use which?

  • Welch’s t-test (var.equal = FALSE, the R default) does not assume the two groups have equal variances. It adjusts the degrees of freedom accordingly. This is almost always the safer choice, especially when sample sizes or spreads differ between groups.
  • Student’s t-test (var.equal = TRUE) assumes equal variances. It can be slightly more powerful when that assumption holds, but the performance cost of using Welch’s when variances are equal is negligible. For this reason, Welch’s is generally recommended.

Interpretation

Compare the two experiment means, the t-statistic, degrees of freedom, p-value, and the 95% CI for the difference. If the CI excludes zero, the two groups differ reliably. Note the direction: which experiment has the higher reward rate, and by how much?


Part 3 — Paired t-test

Research question: Within Experiment 1, do rats earn more reward in the mesh cage context than in the wood cage context?

Each rat in Experiment 1 experienced both cage contexts, so each rat contributes one mean for mesh and one mean for wood. Because the two values come from the same rat, they are not independent — pairing on rat accounts for the fact that some rats are simply better learners than others.

Compute per-rat, per-context means

d_paired <- d1[, .(mean_reward = mean(reward)), .(rat_id, cage_context)]
head(d_paired)
##    rat_id cage_context mean_reward
##     <int>       <char>       <num>
## 1:      1         mesh   0.6428571
## 2:      1         wood   0.5294118
## 3:      2         wood   0.5222222
## 4:      2         mesh   0.5181818
## 5:      3         mesh   0.4411765
## 6:      3         wood   0.6020408

Convert to wide format

d_wide <- dcast(d_paired, rat_id ~ cage_context, value.var = "mean_reward")
head(d_wide)
## Key: <rat_id>
##    rat_id      mesh      wood
##     <int>     <num>     <num>
## 1:      1 0.6428571 0.5294118
## 2:      2 0.5181818 0.5222222
## 3:      3 0.4411765 0.6020408
## 4:      4 0.4901961 0.5204082
## 5:      5 0.5894737 0.4857143
## 6:      6 0.6938776 0.5882353

Run the paired t-test

t.test(d_wide$mesh, d_wide$wood, paired = TRUE)
## 
##  Paired t-test
## 
## data:  d_wide$mesh and d_wide$wood
## t = 1.8233, df = 23, p-value = 0.08129
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.00359283  0.05698528
## sample estimates:
## mean difference 
##      0.02669622

Why pairing increases sensitivity

The paired t-test works on the differences within each rat rather than on the raw scores. By subtracting each rat’s wood score from its mesh score, between-rat variability is removed entirely. If rats differ a lot from each other (some are fast learners, some are slow), that variability inflates the error term in an unpaired test, making it harder to detect a real context effect. The paired test sidesteps this by focusing only on the change within each rat.

Visualise — spaghetti plot

ggplot(d_paired, aes(x = cage_context, y = mean_reward, group = rat_id)) +
  geom_line(alpha = 0.5, colour = "grey60") +
  geom_point(size = 2, aes(colour = cage_context)) +
  scale_colour_manual(values = COL) +
  stat_summary(aes(group = 1), fun = mean, geom = "line",
               linewidth = 1.5, colour = "black", linetype = "dashed") +
  labs(
    x = "Cage context",
    y = "Mean reward rate",
    title = "Per-rat mean reward rate by cage context — Experiment 1",
    colour = "Context"
  )
Each line connects a single rat's mean reward rate in the mesh and wood contexts. Lines sloping upward indicate higher reward in mesh than wood.

Each line connects a single rat’s mean reward rate in the mesh and wood contexts. Lines sloping upward indicate higher reward in mesh than wood.

Interpretation

Inspect the spaghetti plot: if most lines slope in the same direction, that is a visual signal of a consistent context effect. Confirm with the t-test output — does the CI for the mean difference exclude zero? Is the p-value below your chosen threshold?


Part 4 — Assumptions

All three t-test variants share a core set of assumptions. Violating them can inflate the false positive rate or reduce power.

Normality

The t-test assumes the sampling distribution of the mean is approximately normal. In practice this means either:

  • The data themselves are roughly normally distributed (especially important for small samples), or
  • The sample size is large enough for the Central Limit Theorem to apply (often n ≥ 30 is cited as a rough guide, but this depends on how skewed the data are).

Check normality with a histogram and a Q-Q plot:

ggplot(d_rat, aes(x = mean_reward)) +
  geom_histogram(bins = 20, fill = COL[1], colour = "white") +
  labs(x = "Mean reward rate", title = "Distribution of per-rat mean reward")
ggplot(d_rat, aes(sample = mean_reward)) +
  stat_qq(colour = COL[1]) +
  stat_qq_line(colour = COL[2]) +
  labs(x = "Theoretical quantiles", y = "Sample quantiles", title = "Q-Q plot — per-rat mean reward")
Q-Q plot for per-rat mean reward rates in Experiment 1. Points falling close to the line suggest approximate normality.Q-Q plot for per-rat mean reward rates in Experiment 1. Points falling close to the line suggest approximate normality.

Q-Q plot for per-rat mean reward rates in Experiment 1. Points falling close to the line suggest approximate normality.

Independence of observations

Each observation (here, each rat) must be independent of the others. Rats that share cages and influence each other’s behaviour would violate this. If independence is in doubt, the t-test p-values cannot be trusted.

Homogeneity of variance

The independent samples t-test (Student’s version) assumes both groups have equal population variances. Welch’s t-test relaxes this assumption by adjusting the degrees of freedom — this is why Welch’s is generally preferred. For one-sample and paired t-tests, homogeneity of variance is not relevant because there is effectively only one set of values being tested.

A note for the final project

When you apply t-tests to your assigned dataset, briefly state which assumptions you checked and what you found. You do not need to run formal tests for every assumption, but you should inspect a histogram or Q-Q plot and comment on whether the data look approximately normal and whether independence is reasonable given the study design.


Apply this to your assigned dataset

Your assigned dataset is determined by your student ID number. Take the last digit and compute last_digit %% 3 in R.

Load your assigned dataset and run at least two of the following:

  1. A one-sample t-test asking whether a key variable differs from a meaningful reference value (e.g., chance performance).
  2. An independent samples t-test comparing two groups or conditions.
  3. A paired t-test comparing two within-subject conditions (if your design has them).

For each test you run:

  • State the research question clearly.
  • Show the code and output.
  • Write 2–3 sentences interpreting the result, including the direction of the effect, the p-value, and the 95% confidence interval.
  • Briefly note which assumptions you checked and whether they appeared to be met.