# load libraries
library(data.table)
library(ggplot2)
# clean work space
rm(list = ls())
# init colorscheme
COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
theme_minimal(base_size = 13) +
theme(
panel.grid.minor = element_blank(),
strip.text = element_text(face = "bold"),
legend.position = "bottom"
)
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))
Last week we focused on describing how variables behave:
This week we take the next step.
The goal of this tutorial is to connect those descriptive choices to statistical inference.
A key idea for today:
Statistical inference is about deciding whether observed differences are larger than we would expect from sampling variability alone.
This is the point where analysis moves from description to inference.
In your final project you will compare conditions and make claims about differences between them.
For example:
We begin with a simple question using rat T-maze data from Experiment 1.
In Experiment 1, 24 rats ran 200 trials each in a T-maze. The left arm was rewarded on 70% of trials. Each row in the data file gives one trial for one rat.
Load Experiment 1.
d1 <- fread("data/experiment_1_summary.csv")
head(d1)
## experiment rat_id trial cage_context scent choice reward reaction_time maze_run_time
## <char> <int> <int> <char> <char> <char> <int> <num> <num>
## 1: exp1 1 1 mesh none L 1 0.5716877 3.789606
## 2: exp1 1 2 wood none R 0 0.9360044 3.209561
## 3: exp1 1 3 mesh none L 1 0.4574906 2.137541
## 4: exp1 1 4 mesh peppermint L 1 0.4461169 2.648328
## 5: exp1 1 5 mesh lemon L 1 0.2000000 2.727809
## 6: exp1 1 6 mesh peppermint L 1 0.6402233 2.699873
Our question:
Do rats in Experiment 1 choose the rewarded arm more than chance (i.e., is their mean reward rate greater than 0.5)?
Notice that the dependent variable here is a
proportion computed from a binary trial-level variable
(reward), so we are no longer working with a continuous
timing variable like last week.
First, compute each rat’s mean reward rate, the proportion of trials on which the rat received a reward.
d1_rat <- d1[, .(reward_rate = mean(reward)), .(rat_id)]
d1_rat
## rat_id reward_rate
## <int> <num>
## 1: 1 0.585
## 2: 2 0.520
## 3: 3 0.520
## 4: 4 0.505
## 5: 5 0.535
## 6: 6 0.640
## 7: 7 0.455
## 8: 8 0.395
## 9: 9 0.545
## 10: 10 0.545
## 11: 11 0.515
## 12: 12 0.485
## 13: 13 0.580
## 14: 14 0.530
## 15: 15 0.565
## 16: 16 0.555
## 17: 17 0.490
## 18: 18 0.525
## 19: 19 0.540
## 20: 20 0.455
## 21: 21 0.445
## 22: 22 0.445
## 23: 23 0.490
## 24: 24 0.560
## rat_id reward_rate
Why this summary makes sense:
reward is binary on each trialPlot the distribution of per-rat reward rates.
ggplot(d1_rat, aes(x = reward_rate)) +
geom_histogram(bins = 10, fill = COL[1], colour = "white") +
geom_vline(xintercept = 0.5, linetype = 2, colour = "red") +
labs(
title = "Per-rat mean reward rate — Experiment 1",
x = "Mean reward rate",
y = "Number of rats"
)
The dashed line marks chance (0.5).
Notice that the per-rat means vary even though all rats experienced the same 70% reward schedule.
Some of this spread reflects sampling variability: each rat’s 200-trial experience is a sample from the same reward schedule, yet produces a different summary statistic. For teaching purposes, we treat the rats here as if they were generated from the same underlying process.
Even when data come from the same generating process, samples differ.
To see this, simulate two groups of 24 rats whose true reward probability is exactly 0.5 (pure chance), and compare their sample mean reward rates.
set.seed(42)
group_a <- rbinom(24, size = 200, prob = 0.5) / 200
group_b <- rbinom(24, size = 200, prob = 0.5) / 200
mean(group_a)
## [1] 0.4841667
mean(group_b)
## [1] 0.491875
The means differ slightly even though both groups were generated from
the same null distribution (p = 0.5).
Repeat several times to drive the point home.
set.seed(42)
replicate(5, mean(rbinom(24, 200, 0.5) / 200))
## [1] 0.4841667 0.4918750 0.5047917 0.5004167 0.5156250
Important idea:
This is why comparing means directly is not sufficient. We need to ask whether the observed difference is larger than what sampling variability alone would produce.
We now simulate many hypothetical experiments, each with 24 rats whose true reward probability is 0.5.
For each simulated experiment we compute the group mean reward rate.
set.seed(42)
sim_means <- replicate(5000, mean(rbinom(24, 200, 0.5) / 200))
d_sim <- data.table(xbar = sim_means)
ggplot(d_sim, aes(x = xbar)) +
geom_histogram(bins = 40, fill = COL[1], colour = "white") +
geom_vline(
xintercept = mean(d1_rat$reward_rate),
colour = COL[2], linetype = 1, linewidth = 1
) +
labs(
title = "Sampling distribution of mean reward rate under H\u2080 (p = 0.5)",
subtitle = "Orange line = observed mean from Experiment 1",
x = "Simulated group mean reward rate",
y = "Count"
)
This histogram shows the distribution of sample means we would expect if rats were performing at chance.
Observations:
This visual comparison is the core logic of inference.
Last week the main question was “what does this variable look like?” This week the main question is “how unusual is this summary if the null hypothesis were true?”
So let us answer that directly.
How often would a group mean at least this large appear if rats were really performing at chance?
obs_mean <- mean(d1_rat$reward_rate)
tail_prop <- mean(sim_means >= obs_mean)
obs_mean
## [1] 0.5177083
tail_prop
## [1] 0.0052
tail_prop is the proportion of simulated null
experiments that produced a group mean at least as large as the one we
observed.
If this proportion is very small, then the observed mean is rare under the null model.
This gives us a first inferential answer:
This is the same core idea that later appears in formal terms like p-value.
The lecture framed inference as comparing two possible “universes.”
In this tutorial, those universes are:
The simulation in Part 4 built a model of the null universe.
Now we can ask the decision question clearly:
If the null universe were true, would our observed mean be common or rare?
Our simulation suggests it would be rare.
That means the observed data fit the null universe poorly, so we have evidence in favour of the alternative.
This is the basic logic of statistical inference:
The important idea is not the label. Whether we call the result a tail probability, a p-value, or evidence against the null, the reasoning is the same:
We can now turn to a different, but related, question.
Instead of asking “is this mean unusually large under the null?” we can ask:
What range of population mean reward rates is plausible given our sample?
Part 4 used simulation to estimate a tail probability directly.
There is another way to make the same decision.
Instead of waiting to see the observed mean and then asking how much null probability lies above it, we can choose a cutoff in advance.
That cutoff is called a critical value.
Because our question is one-sided:
H0: mean reward rate = 0.5 H1: mean reward rate > 0.5
we only care about the upper tail of the null distribution.
If we are willing to tolerate a 5% Type I error rate, then the critical value is the point with only 5% of the null distribution above it.
crit_value_sim <- quantile(sim_means, 0.95)
crit_value_sim
## 95%
## 0.5114583
Now add that cutoff to the simulated null distribution.
ggplot(d_sim, aes(x = xbar)) +
geom_histogram(bins = 40, fill = COL[1], colour = "white") +
geom_vline(xintercept = obs_mean, colour = COL[2], linewidth = 1) +
geom_vline(xintercept = crit_value_sim, colour = COL[3], linetype = 2, linewidth = 1) +
labs(
title = "Observed mean and critical value under H0",
subtitle = "Orange = observed mean, red dashed = 5% critical value",
x = "Simulated group mean reward rate",
y = "Count"
)
Decision logic:
This is exactly the same inferential logic as the tail probability in Part 4. It is just a different way of expressing the same decision.
A confidence interval answers a different question.
Instead of asking:
Is the observed mean unusually large under a null value of 0.5?
it asks:
Which population mean reward rates are plausible given the sample we observed?
We can keep the same normal-style reasoning and use simulation to build intuition.
First estimate the standard error of the observed mean.
n_rats <- nrow(d1_rat)
se_obs <- sd(d1_rat$reward_rate) / sqrt(n_rats)
se_obs
## [1] 0.01106216
Now imagine many hypothetical repeats of the experiment if the observed mean were the true population mean.
set.seed(42)
sim_ci <- rnorm(5000, mean = obs_mean, sd = se_obs)
d_ci <- data.table(xbar = sim_ci)
ci_lower <- quantile(sim_ci, 0.025)
ci_upper <- quantile(sim_ci, 0.975)
ggplot(d_ci, aes(x = xbar)) +
geom_histogram(bins = 40, fill = COL[1], colour = "white") +
geom_vline(xintercept = obs_mean, colour = COL[2], linewidth = 1) +
geom_vline(xintercept = c(ci_lower, ci_upper),
colour = COL[3], linetype = 2, linewidth = 1) +
labs(
title = "Confidence interval logic from repeated sampling",
subtitle = "Orange = observed mean, red dashed = 95% interval bounds",
x = "Hypothetical repeated sample means",
y = "Count"
)
The red dashed lines mark a range of values close to the observed mean that would commonly arise under repeated sampling.
That gives us a 95% confidence interval for the population mean reward rate.
ci_lower
## 2.5%
## 0.4955721
ci_upper
## 97.5%
## 0.5391596
Interpretation:
0.5 lies outside the interval, that
is evidence that chance performance is not a plausible mean for this
datasetThe lecture showed that there are three closely related ways to express the same inferential decision.
We can now write those three views down formally for the rat example.
Before we do that, an important assumption:
A Normal test assumes that we know the true standard deviation of the sampling distribution under the null hypothesis.
In real data analysis, we usually do not know that value exactly.
Later in the course, we will learn how to proceed when the population standard deviation is unknown. That is where t-tests come in.
For now, we will make a teaching simplification and treat the standard deviation of the null sampling distribution as known.
We are testing:
H0: mu = 0.5H1: mu > 0.5The sample statistic is the mean per-rat reward rate.
From the simulated null distribution in Part 4, we can treat that
statistic as approximately normal under H0.
mu0 <- 0.5
se_null <- sd(sim_means)
mu0
## [1] 0.5
se_null
## [1] 0.007231074
The p-value is the probability, under H0, of observing a
mean at least as large as the one we actually saw.
p_value <- pnorm(obs_mean, mean = mu0, sd = se_null, lower.tail = FALSE)
p_value
## [1] 0.007164229
The critical value is the cutoff that leaves 5% of the null distribution above it.
crit_value <- qnorm(0.95, mean = mu0, sd = se_null)
crit_value
## [1] 0.5118941
If obs_mean > crit_value, reject H0.
For a one-sided test with H1: mu > 0.5, the matching
confidence interval is a one-sided lower bound.
ci_lower_1s <- qnorm(0.05, mean = obs_mean, sd = se_obs, lower.tail = FALSE)
ci_lower_1s
## [1] 0.535904
If 0.5 is below this lower bound, reject
H0.
x <- seq(mu0 - 4 * se_null, mu0 + 4 * se_null, length.out = 1000)
fx <- dnorm(x, mean = mu0, sd = se_null)
d_norm <- data.table(x = x, fx = fx)
ggplot(d_norm, aes(x = x, y = fx)) +
geom_line(linewidth = 1) +
geom_area(
data = d_norm[x >= obs_mean],
aes(x = x, y = fx),
fill = COL[2], alpha = 0.25
) +
geom_vline(xintercept = obs_mean, colour = COL[2], linewidth = 1) +
geom_vline(xintercept = crit_value, colour = COL[3], linetype = 2, linewidth = 1) +
geom_vline(xintercept = ci_lower_1s, colour = "darkgreen", linetype = 2, linewidth = 1) +
labs(
title = "Normal test: p-value, critical value, and confidence interval",
subtitle = "Orange = observed mean, red dashed = critical value, green dashed = lower CI bound",
x = "Mean reward rate",
y = "Density"
)
These three views are equivalent ways to think about the same decision:
The labels differ, but the inferential logic is the same.
In your final project, you will need to answer questions like:
Use this tutorial as a model for how to answer those questions, not just how to list them.
A stronger explanation now sounds like this:
“The trial-level variable was binary, so I first summarised performance as a per-subject reward rate. I then defined a null model and asked whether the observed mean or group difference was unusually large under that model. After that, I used confidence intervals to describe the uncertainty around the estimated means and a plot to communicate those estimates clearly.”
Your assigned dataset is determined by your student ID number. Take
the last digit and compute last_digit %% 3 in R.
0: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_switch1: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_auto2: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_unlearnLoad your assigned dataset into R.
For this tutorial, your task is:
You will use this visualisation and interpretation in your final project.
Today we introduced:
This week extends last week:
Next week we will formalise this process into hypothesis tests and use them to make explicit statistical decisions.