# load libraries
library(data.table)
library(ggplot2)
# clean work space
rm(list = ls())
# init colorscheme
COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
theme_minimal(base_size = 13) +
theme(
panel.grid.minor = element_blank(),
strip.text = element_text(face = "bold"),
legend.position = "bottom"
)
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))
In your final project you will:
This tutorial focuses on the second and third items.
A key idea for today:
Inference and study design can be done well or badly. Today you learn how to tell the difference.
We will use a rat T-maze dataset. In the maze, a rat can turn left or left at a T-junction, and one arm is baited with reward on the majority of trials. We compare two experiments that differ in which arm was baited more often.
d1 <- fread("data/experiment_1_summary.csv")
d2 <- fread("data/experiment_2_summary.csv")
str(d1)
## Classes 'data.table' and 'data.frame': 4800 obs. of 9 variables:
## $ experiment : chr "exp1" "exp1" "exp1" "exp1" ...
## $ rat_id : int 1 1 1 1 1 1 1 1 1 1 ...
## $ trial : int 1 2 3 4 5 6 7 8 9 10 ...
## $ cage_context : chr "mesh" "wood" "mesh" "mesh" ...
## $ scent : chr "none" "none" "none" "peppermint" ...
## $ choice : chr "L" "R" "L" "L" ...
## $ reward : int 1 0 1 1 1 1 0 1 1 1 ...
## $ reaction_time: num 0.572 0.936 0.457 0.446 0.2 ...
## $ maze_run_time: num 3.79 3.21 2.14 2.65 2.73 ...
## - attr(*, ".internal.selfref")=<externalptr>
str(d2)
## Classes 'data.table' and 'data.frame': 4800 obs. of 9 variables:
## $ experiment : chr "exp2" "exp2" "exp2" "exp2" ...
## $ rat_id : int 1 1 1 1 1 1 1 1 1 1 ...
## $ trial : int 1 2 3 4 5 6 7 8 9 10 ...
## $ cage_context : chr "wood" "mesh" "mesh" "mesh" ...
## $ scent : chr "peppermint" "peppermint" "none" "peppermint" ...
## $ choice : chr "L" "L" "R" "R" ...
## $ reward : int 0 0 1 0 0 0 0 1 1 1 ...
## $ reaction_time: num 0.872 0.302 0.486 0.704 0.343 ...
## $ maze_run_time: num 4.07 4.11 3.48 3.98 2.51 ...
## - attr(*, ".internal.selfref")=<externalptr>
Key variables:
rat_id — individual rat (1–24)trial — trial number (1–200)choice — “L” (left) or “R” (right)reward — 1 if rewarded, 0 otherwiseReward contingencies:
We compare the two experiments on the basis of each rat’s mean reward rate across all 200 trials.
rat_means_1 <- d1[, .(mean_reward = mean(reward), experiment = "Experiment 1"), .(rat_id)]
rat_means_2 <- d2[, .(mean_reward = mean(reward), experiment = "Experiment 2"), .(rat_id)]
rat_means <- rbind(rat_means_1, rat_means_2)
rat_means[, .N, .(experiment)]
## experiment N
## <char> <int>
## 1: Experiment 1 24
## 2: Experiment 2 24
Group-level summary:
rat_means[, .(
n = .N,
mean_reward = mean(mean_reward),
sd_reward = sd(mean_reward)
), .(experiment)]
## experiment n mean_reward sd_reward
## <char> <int> <num> <num>
## 1: Experiment 1 24 0.5177083 0.05419328
## 2: Experiment 2 24 0.5262500 0.05485653
The observed difference in mean reward rate:
mean_exp1 <- rat_means[experiment == "Experiment 1", mean(mean_reward)]
mean_exp2 <- rat_means[experiment == "Experiment 2", mean(mean_reward)]
diff_obs <- mean_exp1 - mean_exp2
diff_obs
## [1] -0.008541667
Question (final project framing):
diff_obs meaningful, or could it plausibly be
noise?Today we develop tools to answer that responsibly.
This section shows how small analysis choices can change the story, even when using the same dataset.
set.seed(42)
ggplot(rat_means, aes(x = experiment, y = mean_reward)) +
geom_boxplot(fill = COL[1], outlier.shape = NA) +
geom_point(position = position_jitter(width = 0.15), alpha = 0.4) +
labs(
x = "Experiment",
y = "Per-rat mean reward rate",
title = "Reward rate by experiment"
)
The two groups differ in the expected direction: rats in Experiment 1 earn reward at a different rate than rats in Experiment 2, reflecting the reversed reward contingencies.
We will define an “outlier rule” using the IQR method applied to the pooled per-rat means.
all_means <- rat_means$mean_reward
q1 <- quantile(all_means, 0.25, na.rm = TRUE)
q3 <- quantile(all_means, 0.75, na.rm = TRUE)
iqr <- q3 - q1
lower <- q1 - 1.5 * iqr
upper <- q3 + 1.5 * iqr
lower
## 25%
## 0.3875
upper
## 75%
## 0.6575
Make a cleaned version.
rat_means_clean <- rat_means[mean_reward >= lower & mean_reward <= upper]
rat_means[, .(n_before = .N)]
## n_before
## <int>
## 1: 48
rat_means_clean[, .(n_after = .N)]
## n_after
## <int>
## 1: 48
Compare means before vs after outlier exclusion.
sum_raw <- rat_means[, .(
mean_reward = mean(mean_reward),
sd_reward = sd(mean_reward),
n = .N
), .(experiment)]
sum_cln <- rat_means_clean[, .(
mean_reward = mean(mean_reward),
sd_reward = sd(mean_reward),
n = .N
), .(experiment)]
sum_raw
## experiment mean_reward sd_reward n
## <char> <num> <num> <int>
## 1: Experiment 1 0.5177083 0.05419328 24
## 2: Experiment 2 0.5262500 0.05485653 24
sum_cln
## experiment mean_reward sd_reward n
## <char> <num> <num> <int>
## 1: Experiment 1 0.5177083 0.05419328 24
## 2: Experiment 2 0.5262500 0.05485653 24
Plot the cleaned data.
set.seed(42)
ggplot(rat_means_clean, aes(x = experiment, y = mean_reward)) +
geom_boxplot(fill = COL[1], outlier.shape = NA) +
geom_point(position = position_jitter(width = 0.15), alpha = 0.4) +
labs(
x = "Experiment",
y = "Per-rat mean reward rate",
title = "Reward rate by experiment (outliers removed)"
)
Discussion points:
To justify your analysis, you must be able to explain why a decision was reasonable and what it changed.
In Week 6 we started doing inference (“is there evidence?”). This week we add: “how big is the effect?”
We will compute Cohen’s d using a pooled standard deviation.
pooled_sd <- function(x, g) {
g <- as.factor(g)
x1 <- x[g == levels(g)[1]]
x2 <- x[g == levels(g)[2]]
n1 <- length(x1)
n2 <- length(x2)
s1 <- sd(x1)
s2 <- sd(x2)
sp <- sqrt(((n1 - 1) * s1^2 + (n2 - 1) * s2^2) / (n1 + n2 - 2))
return(sp)
}
sp_raw <- pooled_sd(rat_means$mean_reward, rat_means$experiment)
d_raw <- (mean_exp1 - mean_exp2) / sp_raw
sp_raw
## [1] 0.05452591
d_raw
## [1] -0.1566533
Repeat for the cleaned dataset.
mean_exp1_cln <- rat_means_clean[experiment == "Experiment 1", mean(mean_reward)]
mean_exp2_cln <- rat_means_clean[experiment == "Experiment 2", mean(mean_reward)]
sp_cln <- pooled_sd(rat_means_clean$mean_reward, rat_means_clean$experiment)
d_cln <- (mean_exp1_cln - mean_exp2_cln) / sp_cln
sp_cln
## [1] 0.05452591
d_cln
## [1] -0.1566533
Discussion points:
We will simulate a situation where the effect size is fixed, but we change the sample size.
We will generate data from two normal distributions with a fixed mean difference.
set.seed(42)
mu1 <- 0
mu2 <- 0.3
sigma <- 1
simulate_two_group <- function(n_per_group) {
x1 <- rnorm(n_per_group, mu1, sigma)
x2 <- rnorm(n_per_group, mu2, sigma)
d <- data.table(
grp = rep(c("A", "B"), each = n_per_group),
x = c(x1, x2)
)
return(d)
}
A simple z-test for difference in means (normal test):
z_test_diff_means <- function(x, g, sigma_known) {
g <- as.factor(g)
x1 <- x[g == levels(g)[1]]
x2 <- x[g == levels(g)[2]]
n1 <- length(x1)
n2 <- length(x2)
z <- (mean(x1) - mean(x2)) / (sigma_known * sqrt(1/n1 + 1/n2))
p <- 2 * pnorm(abs(z), lower.tail = FALSE)
return(list(z = z, p = p))
}
Compare p-values for different n.
d_small <- simulate_two_group(20)
d_big <- simulate_two_group(200)
res_small <- z_test_diff_means(d_small$x, d_small$grp, sigma_known = sigma)
res_big <- z_test_diff_means(d_big$x, d_big$grp, sigma_known = sigma)
res_small
## $z
## [1] 0.5151724
##
## $p
## [1] 0.6064326
res_big
## $z
## [1] -3.241692
##
## $p
## [1] 0.001188223
Key point:
This is a central evaluation skill for the final project.
Power is about how likely your method is to detect an effect of a given size.
This matters for:
We will do a simple normal-approximation power calculation for a difference in means.
Use the effect size you observed (raw or cleaned) as a starting point.
d_target <- abs(d_raw)
d_target
## [1] 0.1566533
Convert effect size back into a mean difference scale by choosing a plausible sigma.
Here we use the pooled SD from the real dataset as a proxy.
sigma_proxy <- sp_raw
delta_target <- d_target * sigma_proxy
sigma_proxy
## [1] 0.05452591
delta_target
## [1] 0.008541667
For a two-sided test at alpha = 0.05, the critical value is:
alpha <- 0.05
zcrit <- qnorm(1 - alpha / 2)
zcrit
## [1] 1.959964
If the true mean difference is delta_target, the z-test
statistic has mean:
mu_z = delta_target / (sigma * sqrt(1/n1 + 1/n2))
Assume equal group sizes: n per group.
Then:
mu_z = delta_target / (sigma * sqrt(2/n))
Power (two-sided) is:
P(|Z| > zcrit) when Z ~ N(mu_z, 1).
We compute power across a range of sample sizes.
power_two_sided <- function(n_per_group, delta, sigma, zcrit) {
mu_z <- delta / (sigma * sqrt(2 / n_per_group))
power <- pnorm(-zcrit, mean = mu_z, sd = 1) +
(1 - pnorm(zcrit, mean = mu_z, sd = 1))
return(power)
}
n_vec <- seq(5, 200, by = 5)
pwr <- sapply(n_vec, power_two_sided,
delta = delta_target, sigma = sigma_proxy, zcrit = zcrit)
d_pwr <- data.table(n_per_group = n_vec, power = pwr)
ggplot(d_pwr, aes(x = n_per_group, y = power)) +
geom_line() +
geom_hline(yintercept = 0.8, linetype = 2, colour = "red") +
ylim(0, 1) +
labs(
x = "N per group",
y = "Power",
title = "Power curve for observed effect size"
)
Find the smallest n that achieves 0.8 power.
d_pwr[power >= 0.8][1]
## n_per_group power
## <num> <num>
## 1: NA NA
Discussion points:
Below is a short, deliberately imperfect analysis summary. Your job is to evaluate it.
We compared two conditions and found a statistically significant difference (p < .05). We removed a few outliers because they looked weird. We tried a few different transformations and reported the one that gave the cleanest result. The effect is real because it is significant.
What information is missing that you would need to judge the strength of the evidence?
What analysis choices were made without justification?
Which sentence strongly suggests researcher degrees of freedom (p-hacking risk)?
What would you want reported instead of only “p < .05”?
What would you want to know about sample size and power?
This kind of critique is part of “defending analysis choices” in your final project: you show that you understand what makes an analysis trustworthy.
Today you practiced moving from:
to
This is the core shift needed for the final project.
In particular you now have language and tools to discuss:
Your assigned dataset is determined by your student ID number. Take
the last digit and compute last_digit %% 3 in R.
0: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_switch1: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_auto2: https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_unlearnLoad your assigned dataset into R.
For this tutorial, your task is:
You will include this reflection in your final project as part of the “limitations and assumptions” discussion.