# load libraries

library(data.table)
library(ggplot2)

# clean work space

rm(list = ls())

# init colorscheme

COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
  theme_minimal(base_size = 13) +
    theme(
      panel.grid.minor = element_blank(),
      strip.text = element_text(face = "bold"),
      legend.position = "bottom"
    )
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))

Overview

Last week we focused on describing how variables behave:

  • what kind of variable we have
  • how its values are distributed
  • which summaries and plots are defensible
  • what assumptions those choices imply

This week we take the next step.

The goal of this tutorial is to connect those descriptive choices to statistical inference.

A key idea for today:

Statistical inference is about deciding whether observed differences are larger than we would expect from sampling variability alone.

This is the point where analysis moves from description to inference.


Part 1 — Framing the question

In your final project you will compare conditions and make claims about differences between them.

For example:

  • do rats prefer one arm of a maze over another?
  • does an environmental cue shift that preference?

We begin with a simple question using rat T-maze data from Experiment 1.

In Experiment 1, 24 rats ran 200 trials each in a T-maze. The left arm was rewarded on 70% of trials. Each row in the summary file gives one trial for one rat.

Load Experiment 1.

d1 <- fread("data/experiment_1_summary.csv")
head(d1)
##    experiment rat_id trial cage_context      scent choice reward reaction_time maze_run_time
##        <char>  <int> <int>       <char>     <char> <char>  <int>         <num>         <num>
## 1:       exp1      1     1         mesh       none      L      1     0.5716877      3.789606
## 2:       exp1      1     2         wood       none      R      0     0.9360044      3.209561
## 3:       exp1      1     3         mesh       none      L      1     0.4574906      2.137541
## 4:       exp1      1     4         mesh peppermint      L      1     0.4461169      2.648328
## 5:       exp1      1     5         mesh      lemon      L      1     0.2000000      2.727809
## 6:       exp1      1     6         mesh peppermint      L      1     0.6402233      2.699873

Our question:

Do rats in Experiment 1 choose the rewarded arm more than chance (i.e., is their mean reward rate greater than 0.5)?

Notice that the dependent variable here is a proportion computed from a binary trial-level variable (reward), so we are no longer working with a continuous timing variable like last week.


Part 2 — Per-rat reward rates

First, compute each rat’s mean reward rate, the proportion of trials on which the rat received a reward.

d1_rat <- d1[, .(reward_rate = mean(reward)), .(rat_id)]
d1_rat
##     rat_id reward_rate
##      <int>       <num>
##  1:      1       0.585
##  2:      2       0.520
##  3:      3       0.520
##  4:      4       0.505
##  5:      5       0.535
##  6:      6       0.640
##  7:      7       0.455
##  8:      8       0.395
##  9:      9       0.545
## 10:     10       0.545
## 11:     11       0.515
## 12:     12       0.485
## 13:     13       0.580
## 14:     14       0.530
## 15:     15       0.565
## 16:     16       0.555
## 17:     17       0.490
## 18:     18       0.525
## 19:     19       0.540
## 20:     20       0.455
## 21:     21       0.445
## 22:     22       0.445
## 23:     23       0.490
## 24:     24       0.560
##     rat_id reward_rate

Why this summary makes sense:

  • reward is binary on each trial
  • averaging across many trials gives a per-rat proportion
  • that proportion is the quantity we want to compare with chance

Plot the distribution of per-rat reward rates.

ggplot(d1_rat, aes(x = reward_rate)) +
  geom_histogram(bins = 10, fill = COL[1], colour = "white") +
  geom_vline(xintercept = 0.5, linetype = 2, colour = "red") +
  labs(
    title = "Per-rat mean reward rate — Experiment 1",
    x = "Mean reward rate",
    y = "Number of rats"
  )

The dashed line marks chance (0.5).

Notice that the per-rat means vary even though all rats experienced the same 70% reward schedule.

This is sampling variability in action: each rat’s 200-trial experience is a sample from the same underlying process, yet produces a different summary statistic.


Part 3 — Sampling variability

Even when data come from the same generating process, samples differ.

To see this, simulate two groups of 24 rats whose true reward probability is exactly 0.5 (pure chance), and compare their sample mean reward rates.

set.seed(42)

group_a <- rbinom(24, size = 200, prob = 0.5) / 200
group_b <- rbinom(24, size = 200, prob = 0.5) / 200

mean(group_a)
## [1] 0.4841667
mean(group_b)
## [1] 0.491875

The means differ slightly even though both groups were generated from the same null distribution (p = 0.5).

Repeat several times to drive the point home.

set.seed(42)
replicate(5, mean(rbinom(24, 200, 0.5) / 200))
## [1] 0.4841667 0.4918750 0.5047917 0.5004167 0.5156250

Important idea:

  • samples vary
  • statistics computed from samples vary
  • variation alone can produce apparent differences

This is why comparing means directly is not sufficient. We need to ask whether the observed difference is larger than what sampling variability alone would produce.


Part 4 — Distribution of sample means

We now simulate many hypothetical experiments, each with 24 rats whose true reward probability is 0.5.

For each simulated experiment we compute the group mean reward rate.

set.seed(42)
sim_means <- replicate(5000, mean(rbinom(24, 200, 0.5) / 200))
d_sim <- data.table(xbar = sim_means)

ggplot(d_sim, aes(x = xbar)) +
  geom_histogram(bins = 40, fill = COL[1], colour = "white") +
  geom_vline(
    xintercept = mean(d1_rat$reward_rate),
    colour = COL[2], linetype = 1, linewidth = 1
  ) +
  labs(
    title = "Sampling distribution of mean reward rate under H\u2080 (p = 0.5)",
    subtitle = "Orange line = observed mean from Experiment 1",
    x = "Simulated group mean reward rate",
    y = "Count"
  )

This histogram shows the distribution of sample means we would expect if rats were performing at chance.

Observations:

  • the distribution is approximately normal
  • it is much narrower than the distribution of individual trial outcomes
  • the observed mean from Experiment 1 sits well to the right of this null distribution

This visual comparison is the core logic of inference.

Last week the main question was “what does this variable look like?” This week the main question is “how unusual is this summary if the null hypothesis were true?”


Part 5 — Confidence intervals

A single group mean is only an estimate.

Confidence intervals quantify the uncertainty around that estimate.

Compute the 95% CI for the mean reward rate in Experiment 1 using the per-rat means.

n       <- nrow(d1_rat)
mean_rr <- mean(d1_rat$reward_rate)
se_rr   <- sd(d1_rat$reward_rate) / sqrt(n)

ci_lower <- mean_rr - 1.96 * se_rr
ci_upper <- mean_rr + 1.96 * se_rr

mean_rr
## [1] 0.5177083
ci_lower
## [1] 0.4960265
ci_upper
## [1] 0.5393902

Why this summary makes sense here:

  • the quantity of interest is the mean per-rat reward rate
  • we want to estimate it, not only report a single sample value
  • the confidence interval expresses uncertainty around that estimate

Interpretation:

  • the interval reflects uncertainty due to having only 24 rats
  • if we repeated this experiment many times, roughly 95% of such intervals would contain the true mean reward rate
  • because the interval lies entirely above 0.5, this suggests the rats are performing better than chance

Larger samples produce narrower intervals; smaller samples produce wider ones.


Part 6 — Comparing Experiment 1 and Experiment 2

Experiment 2 used a different group of 24 rats, but this time the right arm was rewarded on 70% of trials.

Load Experiment 2 and compute per-rat reward rates.

d2 <- fread("data/experiment_2_summary.csv")

d2_rat <- d2[, .(reward_rate = mean(reward)), .(rat_id)]

Combine both groups for plotting.

d1_rat[, experiment := "Experiment 1 (left rewarded)"]
d2_rat[, experiment := "Experiment 2 (right rewarded)"]
d_both <- rbind(d1_rat, d2_rat)

Visualise the per-rat reward rates side by side.

ggplot(d_both, aes(x = experiment, y = reward_rate)) +
  geom_boxplot(fill = COL[1], outlier.shape = NA, width = 0.5) +
  geom_jitter(width = 0.1, alpha = 0.4, colour = COL[1]) +
  geom_hline(yintercept = 0.5, linetype = 2, colour = "red") +
  labs(
    title = "Per-rat reward rate: Experiment 1 vs Experiment 2",
    x = "Experiment",
    y = "Mean reward rate"
  )

Why this visualisation makes sense:

  • the x-axis shows the grouping variable (experiment)
  • the y-axis shows the per-rat summary we want to compare
  • the boxplot shows the spread, and the points show the actual rats

The dashed line marks chance.

We now ask a different question than “are the means different?”:

Is the difference between experiments larger than we would expect from sampling variability alone?

Both experiments used a 70% reward schedule, so a large difference between them would be surprising and would suggest something systematic differed.


Part 7 — Critical values

To make a formal decision we need a decision rule.

We use the standard normal distribution. The critical values at -1.96 and +1.96 define the boundaries beyond which only 5% of outcomes fall under the null hypothesis.

x  <- seq(-4, 4, by = 0.01)
fx <- dnorm(x)
d_norm <- data.table(x = x, fx = fx)

d_reject <- rbind(
  d_norm[x <= -1.96],
  d_norm[x >= 1.96]
)

ggplot(d_norm, aes(x = x, y = fx)) +
  geom_line(linewidth = 1) +
  geom_area(
    data = d_reject,
    aes(x = x, y = fx),
    fill = "tomato", alpha = 0.5
  ) +
  geom_vline(xintercept = c(-1.96, 1.96), linetype = 2) +
  annotate(
    "text", x = -2.5, y = 0.05,
    label = "-1.96", hjust = 1
  ) +
  annotate(
    "text", x = 2.5, y = 0.05,
    label = "+1.96", hjust = 0
  ) +
  labs(
    title = "Standard normal distribution with critical values at \u00b11.96",
    subtitle = "Red shaded regions = rejection region (5% total)",
    x = "z",
    y = "Density"
  )

Values in the red regions occur less than 5% of the time under the null hypothesis.

If our test statistic falls in a red region, we reject the null hypothesis.

This connects directly to the hypothesis testing framework introduced in lecture.


Why this matters for the final project

In your final project, you will need to answer questions like:

  • Why did you summarise the data this way?
  • Why did you choose this visualisation?
  • Why is this inferential question the right one?
  • What assumptions are you making about the data?

Use this tutorial as a model for how to answer those questions, not just how to list them.

A stronger explanation now sounds like this:

“The trial-level variable was binary, so I first summarised performance as a per-subject reward rate. I plotted those per-subject values by group to show both the centre and the spread. I then asked whether the observed mean or group difference was larger than expected under sampling variability, because inference is about judging whether an observed pattern is unusually large under a null model.”


Apply this to your assigned dataset

Your assigned dataset is determined by your student ID number. Take the last digit and compute last_digit %% 3 in R.

Load your assigned dataset into R.

For this tutorial, your task is:

  1. Identify one key dependent variable and one grouping variable in your dataset.
  2. State why your summary of the dependent variable is appropriate for that variable type.
  3. Compute per-participant (or per-subject) means for your chosen DV in each condition.
  4. Compute a 95% confidence interval for the mean in each condition.
  5. Create a plot showing the means and confidence intervals across conditions.
  6. Write 2-3 sentences interpreting what the confidence intervals tell you about uncertainty in your estimates, and why this visualisation is a good one for the question.

You will use this visualisation and interpretation in your final project.


Stop here

Today we introduced:

  • sampling variability using real rat T-maze data
  • the distribution of sample means under a null hypothesis
  • confidence intervals for mean reward rates
  • comparing two experiments with a boxplot
  • critical values and the logic of rejection regions

This week extends last week:

  • first justify your summaries and assumptions from the variable
  • then use inference to decide whether the observed pattern is larger than expected by chance

Next week we will formalise this process into hypothesis tests and use them to make explicit statistical decisions.