# load libraries
library(data.table)
library(ggplot2)
# clean work space
rm(list = ls())
# init colorscheme
COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
theme_minimal(base_size = 13) +
theme(
panel.grid.minor = element_blank(),
strip.text = element_text(face = "bold"),
legend.position = "bottom"
)
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))
Welcome to the final tutorial for COGS2020. There is no new statistical content today. Instead, this session is entirely dedicated to helping you finalise your project, practice your oral presentation, and make sure everything works before you hit record.
By the end of this tutorial you should:
Your final project requires you to record a video (~8–10 minutes,
maximum 12 minutes) in which you demonstrate your analysis in a
live R session. You will also submit your R code files.
The assessment covers data loading and wrangling, at least two
ggplot2 plots, at least one t-test, at least one ANOVA or
regression, and a discussion of interpretation and limitations.
Your assigned dataset is determined by the last digit of your student ID:
Use the following structure for your recording. Stick to these time targets so you don’t run over 12 minutes.
| Section | Content | Time |
|---|---|---|
| 1 | Dataset and Research Question | 1–2 minutes |
| 2 | Data Wrangling and Visualisation | 2–3 minutes |
| 3 | Statistical Analysis | 3–4 minutes |
| 4 | Interpretation and Limitations | 2–3 minutes |
Section 1 — Dataset and Research Question (1–2
min)
Briefly describe the dataset: where it came from, what was measured, and
who the participants were. State your research question clearly. What
are you trying to find out? Which variables are you focusing on?
Section 2 — Data Wrangling and Visualisation (2–3
min)
Show how you loaded the data and prepared it for analysis. Walk through
any cleaning steps (renaming columns, filtering, reshaping). Then
present your plots — explain what each one shows and what you conclude
from it.
Section 3 — Statistical Analysis (3–4 min)
Run your tests or models live. For each one, briefly explain why you
chose it (e.g., “I used a paired t-test because each participant was
measured twice under different conditions”). Show the output and narrate
the key numbers.
Section 4 — Interpretation and Limitations (2–3
min)
Explain what your results mean in plain language — not just the p-value,
but what the finding suggests about the cognitive phenomenon under
study. Then discuss at least one assumption you checked (or should
check) for each test, and identify at least one limitation of the study
or your analysis.
Use this as a go / no-go list before you start recording.
ggplot2 plots
with clear axis labels, a title, and a legend (where applicable)These are the mistakes that come up most often. Check each one now, before you record.
The most common issue: your script runs fine in a session you’ve been using for hours, but fails the moment you restart R — because you have objects in memory that aren’t created by your script.
Fix: Restart R and run everything from scratch before recording.
Ctrl + Shift + F10 — restarts R (clears all objects
from memory)Ctrl + Shift + Enter — runs the entire script from top
to bottomDo this now. If anything breaks, fix it before you record.
An unlabelled plot is hard to interpret and will cost marks. Compare these two versions:
# Simulated data for illustration
set.seed(42)
dt <- data.table(
condition = rep(c("A", "B"), each = 30),
response_time = c(rnorm(30, mean = 450, sd = 60),
rnorm(30, mean = 510, sd = 80))
)
# Without labels — not acceptable
ggplot(dt, aes(x = condition, y = response_time)) +
geom_boxplot(fill = COL[1], outlier.shape = NA)
# With labels — what your plots should look like
ggplot(dt, aes(x = condition, y = response_time, fill = condition)) +
geom_boxplot(outlier.shape = NA) +
scale_fill_manual(values = COL) +
labs(
title = "Response time by condition",
x = "Condition",
y = "Response time (ms)",
fill = "Condition"
) +
theme(legend.position = "none")
Every plot must have a title and clear axis labels. Add a legend whenever colour or fill encodes a variable.
Printing a result and moving on is not enough. You must explain what it means.
# Example t-test
result <- t.test(response_time ~ condition, data = dt)
result
##
## Welch Two Sample t-test
##
## data: response_time by condition
## t = -2.2397, df = 57.32, p-value = 0.029
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
## -87.371920 -4.892299
## sample estimates:
## mean in group A mean in group B
## 454.1152 500.2473
Poor interpretation (don’t do this): “As you can see, the p-value is 0.001.”
Good interpretation: “The independent samples t-test showed that mean response times were significantly longer in condition B (M = 510 ms) than condition A (M = 450 ms), t(57.3) = -2.24, p = 0.029. This suggests that condition B introduced additional cognitive load.”
Practice saying this kind of sentence out loud — it takes a little preparation but makes your presentation much stronger.
These come up in summary tables and plot error bars. Keep them straight:
If you are plotting variability in the data, use SD. If you are plotting uncertainty in a mean estimate, use SE. Either is acceptable for this project — just be consistent and label your error bars clearly.
A recording that exceeds 12 minutes will be penalised. The most common cause is spending too long on wrangling or running through too many plots.
Timing template:
| Section | Target | Your time (fill in) |
|---|---|---|
| Dataset and research question | ≤ 2 min | |
| Wrangling and visualisation | ≤ 3 min | |
| Statistical analysis | ≤ 4 min | |
| Interpretation and limitations | ≤ 3 min | |
| Total | ≤ 12 min |
Run through this with a real timer today (see Part 3).
The assessment requires you to execute code live and show the output. Scrolling through a pre-written script without running it does not satisfy this requirement.
During your recording: place your cursor at the top of your script,
restart R, then run each chunk or section using
Ctrl + Enter (run current line / selection) or
Ctrl + Shift + Enter (run entire script). The output must
be visible on screen.
Your task for the next 20 minutes: open your analysis script, set a timer, and run through your entire presentation as if it were the real recording. Follow the structure in Part 1. Narrate out loud — don’t just run code silently.
After you finish, check your time:
Work with a partner. One person delivers the first 3 minutes of their presentation (Sections 1 and the start of Section 2) while the other listens. After 3 minutes, stop. The listener gives feedback on:
Then swap roles.
Use this feedback to refine the language you use when you record. You don’t need to change your analysis — just tighten up how you describe it.
Complete this individually before you start recording. Be honest with yourself — every unchecked box is something to fix today while you still have help available.
## Final project checklist
Complete this before recording.
**Data and wrangling**
- [ ] I can load my dataset without error in a fresh R session.
- [ ] I have cleaned and organised the data into a suitable format (e.g., long vs wide).
- [ ] I can explain what each wrangling step does.
**Visualisation**
- [ ] I have at least two plots with clear axis labels, a title, and a legend (if applicable).
- [ ] My plots directly support the claims I make in the analysis.
**Statistical analysis**
- [ ] I have at least one t-test and can explain why it is appropriate.
- [ ] I have at least one ANOVA or regression and can explain why it is appropriate.
- [ ] I can interpret each result (direction of effect, p-value, confidence interval or effect size).
- [ ] I have briefly discussed assumptions for each test.
**Interpretation and limitations**
- [ ] I can explain what my results mean in plain language.
- [ ] I have identified at least one limitation of the analysis or study design.
**Recording**
- [ ] I have practiced the full presentation and it runs in under 10 minutes.
- [ ] My face is visible throughout.
- [ ] I actively execute code during the recording (not just scroll through it).
- [ ] I have submitted both the video and the R code files.
You have covered a lot of ground this semester - from loading data in R for the first time, through visualisation, t-tests, ANOVA, and regression. The final project is your opportunity to bring all of that together in a single coherent analysis.
The goal is not to produce perfect results. It is to show that you understand what you did and why. Running code live, narrating what is happening, and explaining your results honestly — even when they are messy or non-significant — is far more impressive than presenting polished pre-generated output that you can’t speak to.
Statistically significant results are not required. A null result, clearly explained and thoughtfully discussed, is just as valid as a significant one. What is assessed is your reasoning and your ability to communicate it.
The final step is to present that reasoning clearly and confidently.