Setting up

library(ez)
library(tidyverse)
library(data.table)
rm(list=ls())

This tutorial introduces five core statistical analyses commonly used in psychology and behavioral research. Each scenario gives you:

You’ll use:

Bonus activities:

Q1

A cognitive psychologist wants to test whether reaction time (RT) differs across three levels of task difficulty (Easy, Medium, Hard). Each participant completes all three tasks.

set.seed(1)
rm_data <- data.table(
  id = rep(1:30, each = 3),
  difficulty = rep(c("Easy", "Medium", "Hard"), times = 30),
  rt = c(rnorm(30, mean = 500, sd = 40),
         rnorm(30, mean = 550, sd = 40),
         rnorm(30, mean = 600, sd = 40))
)

head(rm_data)
a). What is the independent variable(s) and how many levels?

Task difficulty (easy, medium, hard)

b). What is the dependant variable?

RT

c). Calculate the mean for the dependent variable within each level of the independent variable(s).
rm_data[, difficulty := factor(difficulty)]
rm_data[, .(mean_rt = mean(rt)), by = difficulty]
d). What type of statistical test would you run? Why?

Repeated Measures One way ANOVA

e). Perform the statistical test and write a sentence explaining the results.
ezANOVA(data = rm_data, dv = rt, wid = id, within = difficulty, type = 3)

# There is no significant effect of task difficulty on reaction time, F(2, 58) = 0.69, p = 0.505. This means that RTs did not reliably differ across the Easy, Medium, and Hard conditions.

Q2

A sleep study measures accuracy under two lighting conditions (Bright vs Dim) and two times of day (Morning vs Evening). Each participant completes all four combinations.

tw_data <- CJ(id = 1:40, light = c("Bright", "Dim"), time = c("Morning", "Evening"))
tw_data[, accuracy := rnorm(.N, 
                            mean = fifelse(light == "Bright" & time == "Morning", 90,
                                    fifelse(light == "Bright", 85,
                                    fifelse(time == "Morning", 80, 75))),
                            sd = 5)]

head(tw_data)
a). What is the independent variable(s) and how many levels?

Lighting (bright, dim), Time (morning, evening)

b). What is the dependant variable?

Accuracy

c). Calculate the mean for the dependent variable within each level of the independent variable(s).
tw_data[, `:=`(light = factor(light), time = factor(time))]
tw_data[, .(mean_accuracy = mean(accuracy)), by = .(light, time)]
d). What type of statistical test would you run? Why?

Two-way Repeated Measures ANOVA

e). Perform the statistical test and write a sentence explaining the results.
ezANOVA(data = tw_data, dv = accuracy, wid = id, within = .(light, time), type = 3)

# There are strong and statistically significant main effects of both lighting and time of day on task accuracy.
# Lighting: Accuracy was higher under Bright light (M ≈ 87.3) than under Dim light (M ≈ 77.5).
# Time of Day: Accuracy was higher in the Morning (M ≈ 85.0) than in the Evening (M ≈ 79.7).
# The interaction between light and day was non-significant.

Q3

Group A (Training) and Group B (Control) complete a memory test before and after a training intervention.

set.seed(3)
mx_data <- CJ(id = 1:60, time = c("Pre", "Post"))
mx_data[, group := rep(rep(c("Training", "Control"), each = 30), each = 2)]
mx_data[, score := ifelse(group == "Training" & time == "Post",
                          rnorm(.N, mean = 85, sd = 5),
                          rnorm(.N, mean = 75, sd = 5))]
a). What is the independent variable(s) and how many levels?

Group (training, group), Time (pre, post)

b). What is the dependant variable?

Memory test

c). Calculate the mean for the dependent variable within each level of the independent variable(s).
mx_data[, `:=`(group = factor(group), time = factor(time))]
mx_data[, .(mean_score = mean(score)), by = .(group, time)]
d). What type of statistical test would you run? Why?

Mixed ANOVA

e). Perform the statistical test and write a sentence explaining the results.
ezANOVA(data = mx_data, dv = score, wid = id, within = time, between = group)

# There is a statistically significant interaction between group and time, F(1, 58) = 26.70, p < .001, indicating that the change in performance over time depends on group membership.

Q4.

A researcher explores the relationship between screen time and sleep quality.

set.seed(4)
cor_data <- data.table(
  screen_time = runif(50, 1, 10)
)
cor_data[, sleep_quality := 100 - screen_time * 6 + rnorm(.N, 0, 5)]
a). Visualise the relationship between sleep quality and screen time.
ggplot(cor_data, aes(x = screen_time, y = sleep_quality)) +
  geom_point() +
  geom_smooth(method = "lm") +
  theme_minimal()
b). Perform the statistical test and write a sentence explaining the results.
cor.test(cor_data$screen_time, cor_data$sleep_quality)

# There is a very strong, statistically significant negative correlation between screen time and sleep quality, r(48) = –0.97, p < .001. This suggests that individuals who spend more time on screens tend to report much lower sleep quality.

Q5.

You want to predict exam performance from hours of study.

set.seed(5)
reg_data <- data.table(
  study_hours = runif(60, 0, 20)
)
reg_data[, exam_score := 50 + 2.5 * study_hours + rnorm(.N, 0, 5)]
a). Visualise the relationship between exam performance and hours of study.
ggplot(reg_data, aes(x = study_hours, y = exam_score)) +
  geom_point() +
  geom_smooth(method = "lm") +
  theme_minimal()
b). Perform the statistical test and write a sentence explaining the results.
model <- lm(exam_score ~ study_hours, data = reg_data)
summary(model)

# The intercept (50.18) indicates the predicted exam score when study_hours = 0.
# The slope (2.49) indicates that for each additional hour of study, exam score increases by approximately 2.5 points, on average.

# There is a strong, statistically significant linear relationship between study hours and exam scores. The model shows that each additional hour of studying is associated with a ~2.5 point increase in exam score, t(58) = 23.02, p < .001.