2025

Assumptions: single-sample t-test

  • Independence: The observations are independent of each other. This means that one participant’s score does not influence another’s.

  • Normality: The population from which the sample is drawn is normally distributed. This assumption becomes less critical with larger sample sizes due to the Central Limit Theorem.

  • Scale of measurement: The dependent variable is measured on a continuous (interval or ratio) scale.

Independence assumption

  • This assumption is mostly justified by study design.

  • Each observation should be unrelated to others.

  • Typical signs of potential dependence:

    • Repeated measures (unless accounted for)
    • Time-ordered data (e.g., learning trials)
    • Clustered data (e.g., students within a class)
  • Not directly testable from data in most simple designs.

Normality assumption

  • Can be inspected visually or tested statistically.

  • Visual checks:

    • Histogram: should look roughly bell-shaped

    • Boxplot: should be symmetric with no extreme outliers

    • Q-Q plot: points fall on a straight diagonal line (I haven’t shown this one)

  • Statistical tests:

    • Shapiro-Wilk test (most common)

    • Anderson-Darling or Kolmogorov-Smirnov etc.

Normality by visual inspection: Example 1

  • Participants viewed a high-contrast checkerboard pattern in the left visual field while maintaining central fixation.

  • Functional MRI data were collected in a block design, and the percent signal change was extracted from a predefined ROI in primary visual cortex (V1).

  • We examine whether the distribution of BOLD signal changes across participants appears normally distributed.

Normality by visual inspection: Example 1

Normality by visual inspection: Example 2

  • Participants completed a perceptual decision-making task in which they had to judge the direction of motion of a cloud of dots.

  • We measure reaction times in a perceptual decision task.

Normality by visual inspection: Example 2

Weakness of visual inspection

  • Visual inspection is subjective

  • Small samples sizes are hard to assess

  • So formal tests must be better, right?

Shapiro-Wilk test

  • The Shapiro-Wilk test is a formal statistical test for normality.

Generic NHST recipe reminder

  1. Specify the null and alternative hypotheses (\(H_0\) and \(H_1\)) in terms of a population parameter \(\theta\).

  2. Specify the type I error rate – denoted by the symbol \(\alpha\) – you are willing to tolerate.

  3. Specify the sample statistic \(\widehat{\theta}\) that you will use to estimate the population parameter \(\theta\) in step 1 and state how it is distributed under the assumption that \(H_0\) is true.

  4. Obtain a random sample and use it to compute the sample statistic from step 3. Call this value \(\widehat{\theta}_{\text{obs}}\).

  5. If \(\widehat{\theta}_{\text{obs}}\) or a more extreme outcome is very unlikely to occur under the assumption that \(H_0\) is true, then reject \(H_0\). Otherwise, do not reject \(H_0\).

Shapiro-Wilk test (steps 1 and 2)

  • Specify hypotheses

\[ \begin{align*} &H_0: \text{Data are normally distributed} \\ &H_1: \text{Data are not normally distributed} \end{align*} \]

  • Choose a significance level

    • Typically \(\alpha = 0.05\)

Shapiro-Wilk test (step 3)

-Choose a test statistic

\[ W = \frac{\left( \sum_{i=1}^{n} a_i x_{(i)} \right)^2}{\sum_{i=1}^{n} (x_i - \bar{x})^2} \]

  • The denominator reflects the total variance in your sample. Dividing by this makes \(W\) scale-invariant.

  • The weights \(a_i\) are chosen to reflect the positions you’d expect for the \(i\)th smallest, 2nd smallest, etc., values if you sampled from a normal distribution.

  • This makes the numerator reflects how well the sorted values in your sample match the pattern you’d expect if the data came from a normal distribution.

  • If your sorted data follow that normal pattern, \(W \approx 1\); Else \(W\) drops toward 0.

  • The sampling distribution of \(W\) under the null is not standard like the normal or \(t\). It is estimated for each sample size by repeatedly simulating samples from a normal distribution and calculating \(W\).

Example in R:

# Simulat RT data (positively skewed)
rt_data <- rexp(100, rate = 1/500)
shapiro.test(rt_data)
## 
##  Shapiro-Wilk normality test
## 
## data:  rt_data
## W = 0.86412, p-value = 4.09e-08
  • If \(p < \alpha\), reject \(H_0\) and conclude that the data are not normally distributed.

Low power in small samples: SW worse than many NHSTs

  • All NHSTs have reduced power with small \(n\)), but SW is especially vulnerable because it tests a complex composite hypothesis based on the entire shape of the distribution.

  • The null hypothesis in SW is broad and hard to falsify: E.g., data come from some normal distribution vs mean equals 0.

  • Small samples have noisy order statistics, and since SW relies on the ordering and spacing of values, this introduces more variability and weakens the signal.

Oversensitivity in large samples: SW worse than many NHSTs

  • Most NHSTs become more sensitive with increasing \(n\), but SW is unusually sensitive because:

  • It uses all the data (not just a central tendency).

  • It looks for any deviation from normality e.g. skew, kurtosis, slight multi-modality, etc.

  • But the SW test becomes more confident that your data aren’t perfectly normal,even if those deviations are unlikely to have a noticable influence on downstream analyses like t-tests or ANOVAs.

t-test is robust to violations of normality

  • Because of the CLT large samples (e.g., \(n \geq 30\)) lead to sampling distributions for the sample mean and corresponding \(t\) that are approximately normal, even if the population is not.

  • If the population is skewed or has outliers, the CLT convergence to normality is slower.

  • This means larger samples are required for the t-test to remain valid.

What does it mean to be robust?

Key takeaways

  • The t-test assumes independence, normality, and continuous measurement.

  • Independence is handled via good design.

  • Normality can be inspected visually and tested but neither method is perfect.

  • CLT causes robustness of the \(t\)-test but with heavy skew or large outliers, you may need much larger samples.

Alternatives when assumptions are violated

  • Use nonparametric tests (e.g., Wilcoxon signed-rank test)

  • Use robust methods (e.g., Bootstrapping)

  • Transform the data (e.g., log transformation)

  • All beyond the scope of this course!