# load libraries
library(data.table)
library(ggplot2)
# clean work space
rm(list = ls())
# init colorscheme
COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
theme_minimal(base_size = 13) +
theme(
panel.grid.minor = element_blank(),
strip.text = element_text(face = "bold"),
legend.position = "bottom"
)
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))
Last week we focused on describing how variables behave:
This week we take the next step.
The goal of this tutorial is to connect those descriptive choices to statistical inference.
A key idea for today:
Statistical inference is about deciding whether observed differences are larger than we would expect from sampling variability alone.
This is the point where analysis moves from description to inference.
In your final project you will compare conditions and make claims about differences between them.
For example:
We begin with a simple question using rat T-maze data from Experiment 1.
In Experiment 1, 24 rats ran 200 trials each in a T-maze. The left arm was rewarded on 70% of trials. Each row in the summary file gives one trial for one rat.
Load Experiment 1.
d1 <- fread("data/experiment_1_summary.csv")
head(d1)
## experiment rat_id trial cage_context scent choice reward reaction_time maze_run_time
## <char> <int> <int> <char> <char> <char> <int> <num> <num>
## 1: exp1 1 1 mesh none L 1 0.5716877 3.789606
## 2: exp1 1 2 wood none R 0 0.9360044 3.209561
## 3: exp1 1 3 mesh none L 1 0.4574906 2.137541
## 4: exp1 1 4 mesh peppermint L 1 0.4461169 2.648328
## 5: exp1 1 5 mesh lemon L 1 0.2000000 2.727809
## 6: exp1 1 6 mesh peppermint L 1 0.6402233 2.699873
Our question:
Do rats in Experiment 1 choose the rewarded arm more than chance (i.e., is their mean reward rate greater than 0.5)?
Notice that the dependent variable here is a
proportion computed from a binary trial-level variable
(reward), so we are no longer working with a continuous
timing variable like last week.
First, compute each rat’s mean reward rate, the proportion of trials on which the rat received a reward.
d1_rat <- d1[, .(reward_rate = mean(reward)), .(rat_id)]
d1_rat
## rat_id reward_rate
## <int> <num>
## 1: 1 0.585
## 2: 2 0.520
## 3: 3 0.520
## 4: 4 0.505
## 5: 5 0.535
## 6: 6 0.640
## 7: 7 0.455
## 8: 8 0.395
## 9: 9 0.545
## 10: 10 0.545
## 11: 11 0.515
## 12: 12 0.485
## 13: 13 0.580
## 14: 14 0.530
## 15: 15 0.565
## 16: 16 0.555
## 17: 17 0.490
## 18: 18 0.525
## 19: 19 0.540
## 20: 20 0.455
## 21: 21 0.445
## 22: 22 0.445
## 23: 23 0.490
## 24: 24 0.560
## rat_id reward_rate
Why this summary makes sense:
reward is binary on each trialPlot the distribution of per-rat reward rates.
ggplot(d1_rat, aes(x = reward_rate)) +
geom_histogram(bins = 10, fill = COL[1], colour = "white") +
geom_vline(xintercept = 0.5, linetype = 2, colour = "red") +
labs(
title = "Per-rat mean reward rate — Experiment 1",
x = "Mean reward rate",
y = "Number of rats"
)
The dashed line marks chance (0.5).
Notice that the per-rat means vary even though all rats experienced the same 70% reward schedule.
This is sampling variability in action: each rat’s 200-trial experience is a sample from the same underlying process, yet produces a different summary statistic.
Even when data come from the same generating process, samples differ.
To see this, simulate two groups of 24 rats whose true reward probability is exactly 0.5 (pure chance), and compare their sample mean reward rates.
set.seed(42)
group_a <- rbinom(24, size = 200, prob = 0.5) / 200
group_b <- rbinom(24, size = 200, prob = 0.5) / 200
mean(group_a)
## [1] 0.4841667
mean(group_b)
## [1] 0.491875
The means differ slightly even though both groups were generated from
the same null distribution (p = 0.5).
Repeat several times to drive the point home.
set.seed(42)
replicate(5, mean(rbinom(24, 200, 0.5) / 200))
## [1] 0.4841667 0.4918750 0.5047917 0.5004167 0.5156250
Important idea:
This is why comparing means directly is not sufficient. We need to ask whether the observed difference is larger than what sampling variability alone would produce.
We now simulate many hypothetical experiments, each with 24 rats whose true reward probability is 0.5.
For each simulated experiment we compute the group mean reward rate.
set.seed(42)
sim_means <- replicate(5000, mean(rbinom(24, 200, 0.5) / 200))
d_sim <- data.table(xbar = sim_means)
ggplot(d_sim, aes(x = xbar)) +
geom_histogram(bins = 40, fill = COL[1], colour = "white") +
geom_vline(
xintercept = mean(d1_rat$reward_rate),
colour = COL[2], linetype = 1, linewidth = 1
) +
labs(
title = "Sampling distribution of mean reward rate under H\u2080 (p = 0.5)",
subtitle = "Orange line = observed mean from Experiment 1",
x = "Simulated group mean reward rate",
y = "Count"
)
This histogram shows the distribution of sample means we would expect if rats were performing at chance.
Observations:
This visual comparison is the core logic of inference.
Last week the main question was “what does this variable look like?” This week the main question is “how unusual is this summary if the null hypothesis were true?”
A single group mean is only an estimate.
Confidence intervals quantify the uncertainty around that estimate.
Compute the 95% CI for the mean reward rate in Experiment 1 using the per-rat means.
n <- nrow(d1_rat)
mean_rr <- mean(d1_rat$reward_rate)
se_rr <- sd(d1_rat$reward_rate) / sqrt(n)
ci_lower <- mean_rr - 1.96 * se_rr
ci_upper <- mean_rr + 1.96 * se_rr
mean_rr
## [1] 0.5177083
ci_lower
## [1] 0.4960265
ci_upper
## [1] 0.5393902
Why this summary makes sense here:
Interpretation:
Larger samples produce narrower intervals; smaller samples produce wider ones.
Experiment 2 used a different group of 24 rats, but this time the right arm was rewarded on 70% of trials.
Load Experiment 2 and compute per-rat reward rates.
d2 <- fread("data/experiment_2_summary.csv")
d2_rat <- d2[, .(reward_rate = mean(reward)), .(rat_id)]
Combine both groups for plotting.
d1_rat[, experiment := "Experiment 1 (left rewarded)"]
d2_rat[, experiment := "Experiment 2 (right rewarded)"]
d_both <- rbind(d1_rat, d2_rat)
Visualise the per-rat reward rates side by side.
ggplot(d_both, aes(x = experiment, y = reward_rate)) +
geom_boxplot(fill = COL[1], outlier.shape = NA, width = 0.5) +
geom_jitter(width = 0.1, alpha = 0.4, colour = COL[1]) +
geom_hline(yintercept = 0.5, linetype = 2, colour = "red") +
labs(
title = "Per-rat reward rate: Experiment 1 vs Experiment 2",
x = "Experiment",
y = "Mean reward rate"
)
Why this visualisation makes sense:
experiment)The dashed line marks chance.
We now ask a different question than “are the means different?”:
Is the difference between experiments larger than we would expect from sampling variability alone?
Both experiments used a 70% reward schedule, so a large difference between them would be surprising and would suggest something systematic differed.
To make a formal decision we need a decision rule.
We use the standard normal distribution. The critical
values at -1.96 and +1.96 define the
boundaries beyond which only 5% of outcomes fall under the null
hypothesis.
x <- seq(-4, 4, by = 0.01)
fx <- dnorm(x)
d_norm <- data.table(x = x, fx = fx)
d_reject <- rbind(
d_norm[x <= -1.96],
d_norm[x >= 1.96]
)
ggplot(d_norm, aes(x = x, y = fx)) +
geom_line(linewidth = 1) +
geom_area(
data = d_reject,
aes(x = x, y = fx),
fill = "tomato", alpha = 0.5
) +
geom_vline(xintercept = c(-1.96, 1.96), linetype = 2) +
annotate(
"text", x = -2.5, y = 0.05,
label = "-1.96", hjust = 1
) +
annotate(
"text", x = 2.5, y = 0.05,
label = "+1.96", hjust = 0
) +
labs(
title = "Standard normal distribution with critical values at \u00b11.96",
subtitle = "Red shaded regions = rejection region (5% total)",
x = "z",
y = "Density"
)
Values in the red regions occur less than 5% of the time under the null hypothesis.
If our test statistic falls in a red region, we reject the null hypothesis.
This connects directly to the hypothesis testing framework introduced in lecture.
In your final project, you will need to answer questions like:
Use this tutorial as a model for how to answer those questions, not just how to list them.
A stronger explanation now sounds like this:
“The trial-level variable was binary, so I first summarised performance as a per-subject reward rate. I plotted those per-subject values by group to show both the centre and the spread. I then asked whether the observed mean or group difference was larger than expected under sampling variability, because inference is about judging whether an observed pattern is unusually large under a null model.”
Your assigned dataset is determined by your student ID number. Take
the last digit and compute last_digit %% 3 in R.
0: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_switch1: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_auto2: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_unlearnLoad your assigned dataset into R.
For this tutorial, your task is:
You will use this visualisation and interpretation in your final project.
Today we introduced:
This week extends last week:
Next week we will formalise this process into hypothesis tests and use them to make explicit statistical decisions.