Setting up
library(ez)
library(tidyverse)
library(data.table)
rm(list=ls())
This tutorial introduces five core statistical analyses commonly used in psychology and behavioral research. Each scenario gives you:
You’ll use:
Bonus activities:
A cognitive psychologist wants to test whether reaction time (RT) differs across three levels of task difficulty (Easy, Medium, Hard). Each participant completes all three tasks.
set.seed(1)
rm_data <- data.table(
id = rep(1:30, each = 3),
difficulty = rep(c("Easy", "Medium", "Hard"), times = 30),
rt = c(rnorm(30, mean = 500, sd = 40),
rnorm(30, mean = 550, sd = 40),
rnorm(30, mean = 600, sd = 40))
)
head(rm_data)
Task difficulty (easy, medium, hard)
RT
rm_data[, difficulty := factor(difficulty)]
rm_data[, .(mean_rt = mean(rt)), by = difficulty]
Repeated Measures One way ANOVA
ezANOVA(data = rm_data, dv = rt, wid = id, within = difficulty, type = 3)
# There is no significant effect of task difficulty on reaction time, F(2, 58) = 0.69, p = 0.505. This means that RTs did not reliably differ across the Easy, Medium, and Hard conditions.
A sleep study measures accuracy under two lighting conditions (Bright vs Dim) and two times of day (Morning vs Evening). Each participant completes all four combinations.
tw_data <- CJ(id = 1:40, light = c("Bright", "Dim"), time = c("Morning", "Evening"))
tw_data[, accuracy := rnorm(.N,
mean = fifelse(light == "Bright" & time == "Morning", 90,
fifelse(light == "Bright", 85,
fifelse(time == "Morning", 80, 75))),
sd = 5)]
head(tw_data)
Lighting (bright, dim), Time (morning, evening)
Accuracy
tw_data[, `:=`(light = factor(light), time = factor(time))]
tw_data[, .(mean_accuracy = mean(accuracy)), by = .(light, time)]
Two-way Repeated Measures ANOVA
ezANOVA(data = tw_data, dv = accuracy, wid = id, within = .(light, time), type = 3)
# There are strong and statistically significant main effects of both lighting and time of day on task accuracy.
# Lighting: Accuracy was higher under Bright light (M ≈ 87.3) than under Dim light (M ≈ 77.5).
# Time of Day: Accuracy was higher in the Morning (M ≈ 85.0) than in the Evening (M ≈ 79.7).
# The interaction between light and day was non-significant.
Group A (Training) and Group B (Control) complete a memory test before and after a training intervention.
set.seed(3)
mx_data <- CJ(id = 1:60, time = c("Pre", "Post"))
mx_data[, group := rep(rep(c("Training", "Control"), each = 30), each = 2)]
mx_data[, score := ifelse(group == "Training" & time == "Post",
rnorm(.N, mean = 85, sd = 5),
rnorm(.N, mean = 75, sd = 5))]
Group (training, group), Time (pre, post)
Memory test
mx_data[, `:=`(group = factor(group), time = factor(time))]
mx_data[, .(mean_score = mean(score)), by = .(group, time)]
Mixed ANOVA
ezANOVA(data = mx_data, dv = score, wid = id, within = time, between = group)
# There is a statistically significant interaction between group and time, F(1, 58) = 26.70, p < .001, indicating that the change in performance over time depends on group membership.
A researcher explores the relationship between screen time and sleep quality.
set.seed(4)
cor_data <- data.table(
screen_time = runif(50, 1, 10)
)
cor_data[, sleep_quality := 100 - screen_time * 6 + rnorm(.N, 0, 5)]
ggplot(cor_data, aes(x = screen_time, y = sleep_quality)) +
geom_point() +
geom_smooth(method = "lm") +
theme_minimal()
cor.test(cor_data$screen_time, cor_data$sleep_quality)
# There is a very strong, statistically significant negative correlation between screen time and sleep quality, r(48) = –0.97, p < .001. This suggests that individuals who spend more time on screens tend to report much lower sleep quality.
You want to predict exam performance from hours of study.
set.seed(5)
reg_data <- data.table(
study_hours = runif(60, 0, 20)
)
reg_data[, exam_score := 50 + 2.5 * study_hours + rnorm(.N, 0, 5)]
ggplot(reg_data, aes(x = study_hours, y = exam_score)) +
geom_point() +
geom_smooth(method = "lm") +
theme_minimal()
model <- lm(exam_score ~ study_hours, data = reg_data)
summary(model)
# The intercept (50.18) indicates the predicted exam score when study_hours = 0.
# The slope (2.49) indicates that for each additional hour of study, exam score increases by approximately 2.5 points, on average.
# There is a strong, statistically significant linear relationship between study hours and exam scores. The model shows that each additional hour of studying is associated with a ~2.5 point increase in exam score, t(58) = 23.02, p < .001.