Tutorial 13 - Final project preparation

# load libraries

library(data.table)
library(ggplot2)

# clean work space

rm(list = ls())

# init colorscheme

COL <- c("#2271B2", "#E69F00", "#D55E00")
names(COL) <- c("blue", "orange", "red")
theme_set(
  theme_minimal(base_size = 13) +
    theme(
      panel.grid.minor = element_blank(),
      strip.text = element_text(face = "bold"),
      legend.position = "bottom"
    )
)
update_geom_defaults("point", list(size = 2))
update_geom_defaults("line", list(linewidth = 0.8))

Overview

Welcome to the final tutorial for COGS2020. There is no new statistical content today. Instead, this session is entirely dedicated to helping you finalise your project, practice your oral presentation, and make sure everything works before you hit record.

By the end of this tutorial you should:

Know the expected structure of your video presentation.
Have run through your full presentation at least once with a timer.
Have received peer feedback on a short excerpt of your presentation.
Have completed the final project checklist.

Your final project requires you to record a video (~8–10 minutes, maximum 12 minutes) in which you demonstrate your analysis in a live R session. You will also submit your R code files. The assessment covers data loading and wrangling, at least two ggplot2 plots, at least one t-test, at least one ANOVA or regression, and a discussion of interpretation and limitations.

Your assigned dataset is determined by the last digit of your student ID:

Last digit (`last_digit %% 3`)	Dataset
`== 0`	https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_switch
`== 1`	https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_auto
`== 2`	https://github.com/crossley/cogs2020/tree/main/final_project_data/cat_learn_unlearn

Part 1 — Presentation structure

Use the following structure for your recording. Stick to these time targets so you don’t run over 12 minutes.

Recommended timing

Section	Content	Time
1	Dataset and Research Question	1–2 minutes
2	Data Wrangling and Visualisation	2–3 minutes
3	Statistical Analysis	3–4 minutes
4	Interpretation and Limitations	2–3 minutes

What each section should cover

Section 1 — Dataset and Research Question (1–2 min)
Briefly describe the dataset: where it came from, what was measured, and who the participants were. State your research question clearly. What are you trying to find out? Which variables are you focusing on?

Section 2 — Data Wrangling and Visualisation (2–3 min)
Show how you loaded the data and prepared it for analysis. Walk through any cleaning steps (renaming columns, filtering, reshaping). Then present your plots — explain what each one shows and what you conclude from it.

Section 3 — Statistical Analysis (3–4 min)
Run your tests or models live. For each one, briefly explain why you chose it (e.g., “I used a paired t-test because each participant was measured twice under different conditions”). Show the output and narrate the key numbers.

Section 4 — Interpretation and Limitations (2–3 min)
Explain what your results mean in plain language — not just the p-value, but what the finding suggests about the cognitive phenomenon under study. Then discuss at least one assumption you checked (or should check) for each test, and identify at least one limitation of the study or your analysis.

Part 2 — Common pitfalls

These are the mistakes that come up most often. Check each one now, before you record.

1. Code won’t run in a fresh session

The most common issue: your script runs fine in a session you’ve been using for hours, but fails the moment you restart R — because you have objects in memory that aren’t created by your script.

Fix: Restart R and run everything from scratch before recording.

Ctrl + Shift + F10 — restarts R (clears all objects from memory)
Ctrl + Shift + Enter — runs the entire script from top to bottom

Do this now. If anything breaks, fix it before you record.

2. Plots with missing labels

An unlabelled plot is hard to interpret and will cost marks. Compare these two versions:

# Simulated data for illustration
set.seed(42)
dt <- data.table(
  condition = rep(c("A", "B"), each = 30),
  response_time = c(rnorm(30, mean = 450, sd = 60),
                    rnorm(30, mean = 510, sd = 80))
)

# Without labels — not acceptable
ggplot(dt, aes(x = condition, y = response_time)) +
  geom_boxplot(fill = COL[1], outlier.shape = NA)

# With labels — what your plots should look like
ggplot(dt, aes(x = condition, y = response_time, fill = condition)) +
  geom_boxplot(outlier.shape = NA) +
  scale_fill_manual(values = COL) +
  labs(
    title = "Response time by condition",
    x = "Condition",
    y = "Response time (ms)",
    fill = "Condition"
  ) +
  theme(legend.position = "none")

Every plot must have a title and clear axis labels. Add a legend whenever colour or fill encodes a variable.

3. Showing output without interpreting it

Printing a result and moving on is not enough. You must explain what it means.

# Example t-test
result <- t.test(response_time ~ condition, data = dt)
result
## 
##  Welch Two Sample t-test
## 
## data:  response_time by condition
## t = -2.2397, df = 57.32, p-value = 0.029
## alternative hypothesis: true difference in means between group A and group B is not equal to 0
## 95 percent confidence interval:
##  -87.371920  -4.892299
## sample estimates:
## mean in group A mean in group B 
##        454.1152        500.2473

Poor interpretation (don’t do this): “As you can see, the p-value is 0.001.”

Good interpretation: “The independent samples t-test showed that mean response times were significantly longer in condition B (M = 510 ms) than condition A (M = 450 ms), t(57.3) = -2.24, p = 0.029. This suggests that condition B introduced additional cognitive load.”

Practice saying this kind of sentence out loud — it takes a little preparation but makes your presentation much stronger.

4. Confusing standard deviation and standard error

These come up in summary tables and plot error bars. Keep them straight:

Standard deviation (SD): describes the spread of individual observations around the mean.
Standard error (SE): describes how precisely the mean is estimated; SE = SD / √n.

If you are plotting variability in the data, use SD. If you are plotting uncertainty in a mean estimate, use SE. Either is acceptable for this project — just be consistent and label your error bars clearly.

5. Running over time

A recording that exceeds 12 minutes will be penalised. The most common cause is spending too long on wrangling or running through too many plots.

Timing template:

Section	Target	Your time (fill in)
Dataset and research question	≤ 2 min
Wrangling and visualisation	≤ 3 min
Statistical analysis	≤ 4 min
Interpretation and limitations	≤ 3 min
Total	≤ 12 min

Run through this with a real timer today (see Part 3).

6. Scrolling through code instead of running it

The assessment requires you to execute code live and show the output. Scrolling through a pre-written script without running it does not satisfy this requirement.

During your recording: place your cursor at the top of your script, restart R, then run each chunk or section using Ctrl + Enter (run current line / selection) or Ctrl + Shift + Enter (run entire script). The output must be visible on screen.

Part 3 — Timing walkthrough (in-class activity)

Your task for the next 20 minutes: open your analysis script, set a timer, and run through your entire presentation as if it were the real recording. Follow the structure in Part 1. Narrate out loud — don’t just run code silently.

After you finish, check your time:

Under 8 minutes: You may be skipping important detail. Add more verbal explanation of what your code does and what results mean.
8–10 minutes:
10–12 minutes: You are close to the limit. Identify one section you can tighten.
Over 12 minutes: You need to cut material. Common options:
- Reduce the number of plots you walk through in detail (show them but narrate one key observation each).
- Shorten the data wrangling walk-through — you don’t need to explain every line.
- If you have optional or exploratory analyses, consider cutting them.

Part 4 — Peer review (in-class activity)

Work with a partner. One person delivers the first 3 minutes of their presentation (Sections 1 and the start of Section 2) while the other listens. After 3 minutes, stop. The listener gives feedback on:

Was the research question stated clearly? Could you describe it back in one sentence?
Was it clear what each piece of code was doing, or did it feel like unexplained steps?
Was any statistical output interpreted, or was it just shown on screen?

Then swap roles.

Use this feedback to refine the language you use when you record. You don’t need to change your analysis — just tighten up how you describe it.

Part 5 — Final checklist

Complete this individually before you start recording. Be honest with yourself — every unchecked box is something to fix today while you still have help available.

## Final project checklist

Complete this before recording.

**Data and wrangling**
- [ ] I can load my dataset without error in a fresh R session.
- [ ] I have cleaned and organised the data into a suitable format (e.g., long vs wide).
- [ ] I can explain what each wrangling step does.

**Visualisation**
- [ ] I have at least two plots with clear axis labels, a title, and a legend (if applicable).
- [ ] My plots directly support the claims I make in the analysis.

**Statistical analysis**
- [ ] I have at least one t-test and can explain why it is appropriate.
- [ ] I have at least one ANOVA or regression and can explain why it is appropriate.
- [ ] I can interpret each result (direction of effect, p-value, confidence interval or effect size).
- [ ] I have briefly discussed assumptions for each test.

**Interpretation and limitations**
- [ ] I can explain what my results mean in plain language.
- [ ] I have identified at least one limitation of the analysis or study design.

**Recording**
- [ ] I have practiced the full presentation and it runs in under 10 minutes.
- [ ] My face is visible throughout.
- [ ] I actively execute code during the recording (not just scroll through it).
- [ ] I have submitted both the video and the R code files.

A final note

You have covered a lot of ground this semester - from loading data in R for the first time, through visualisation, t-tests, ANOVA, and regression. The final project is your opportunity to bring all of that together in a single coherent analysis.

The goal is not to produce perfect results. It is to show that you understand what you did and why. Running code live, narrating what is happening, and explaining your results honestly — even when they are messy or non-significant — is far more impressive than presenting polished pre-generated output that you can’t speak to.

Statistically significant results are not required. A null result, clearly explained and thoughtfully discussed, is just as valid as a significant one. What is assessed is your reasoning and your ability to communicate it.

The final step is to present that reasoning clearly and confidently.