The one-sample t-Test

2024

We don’t usually know \(\sigma_X\)?

The Normal test requires that we know \(\sigma_X\).
In practice, we must almost always estimate \(\sigma_X\) from the data.
\(s_X\) is a good estimator of \(\sigma_X\)

\[ \begin{align} s^2_X &= \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \overline{X})^2 \\\\ s_X &= \sqrt{\frac{1}{n-1} \sum_{i=1}^{n} (X_i - \overline{X})^2} \\\\ \widehat{\sigma_X} &= s_X \end{align} \]

Do not plug \(\widehat{\sigma_X}\) into a Normal test

\(X \sim \mathcal{N}(\mu_X, \sigma_X)\): Good
\(\overline{X} \sim \mathcal{N}(\mu_X, \frac{\sigma_X}{n})\): Good
\(\overline{X} \sim \mathcal{N}(\mu_X, \frac{\widehat{\sigma_X}}{n})\): BAD

\(\widehat{\sigma_X}\) turns a Normal into a t

\[ \begin{align} X &\sim \mathcal{N}(\mu_X, \sigma_X) \\ Z &= \frac{X - \mu_X}{\sigma_X} \\ Z &\sim \mathcal{N}(0, 1) \\\\ X &\sim \mathcal{N}(\mu_X, \sigma_X) \\ t &= \frac{X - \mu_X}{\widehat{\sigma_X}} \\ t &\sim t(df) \\ df &= n - 1 \end{align} \]

Quick note on nomenclature

We mostly use capital letters for random variables and lower case letters for observed values sampled from a random variable.
E.g. \(X\) is a random variable, \(x\) is an observed value sampled from \(X\).
However, lower case \(t\) is used for both the random variable and the observed value.
I will try to use \(t_{obs}\) to denote the observed value of the test statistic but you should always try to make sense of things from context.

Estimating variance adds uncertainty

The t has heavier tails than the Normal.
Increasing \(df\) makes the t look more like the Normal.

The one-sample t-Test: visualised

Normal vs t: P-Values

Higher tails \(\rightarrow\) larger p-values \(\rightarrow\) less evidence against \(H_0\).
Use the pt function in R to compute p-values.

Normal vs t: Critical Values

Higher tails \(\rightarrow\) larger p-values \(\rightarrow\) less evidence against \(H_0\).
Use the qt function in R to compute critical values.

t-test full example

Suppose we are interested in testing whether a neuron increases its firing rate in response to a peripheral stimulus.
Let \(X \sim \mathcal{N}(\mu_X, \sigma_X)\) be the random variable that generates the firing rate of the neuron.

\[ \begin{align} H_0: &\ \mu_X = 10 \\ H_1: &\ \mu_X > 10 \\[2ex] \alpha &= 0.05 \\[2ex] t_{obs} &= \frac{\overline{x}_{obs} - \mu_{\overline{X}_{H_0}}}{\widehat{\sigma}_\overline{X}} \\ t_{obs} &= \frac{\overline{x}_{obs} - \mu_{X_{H_0}}}{\widehat{\sigma}_X / \sqrt{n}} \\ &\sim t(df=n-1) \end{align} \]

Suppose we observe the following firing rates:

##  [1] 13.448164 11.719628 11.801543 11.221365  9.888318 14.573826 11.995701
##  [8]  7.066766 12.402712 10.054417

The observed test statistic (\(t_{obs}\)):

t_obs <- (mean(x_obs) - mu_x) / (sd(x_obs) / sqrt(n))

The p-value and critical value:

p_value <- pt(t_obs, df=n-1, lower.tail=F)
t_crit <- qt(0.05, df=n-1, lower.tail=F)

The decision:

if (p_value < 0.05) {
  decision <- "Reject H0"
} else {
  decision <- "Fail to reject H0"
}
decision

## [1] "Reject H0"

Inference is done from the \(t\)

p-values etc. comes from the \(\overline{X}\) distribution in a Normal test, but come from the \(t\) in a \(t\)-test.

t-test in R

# suppose you observe the following firing rates
x_obs <- rnorm(10, mean=10+1, sd=2)

res <- t.test(x=x_obs, 
              y=NULL, 
              alternative="two.sided",
              mu=10,
              paired=F, ## doesn't matter with one-sample
              var.equal=F, ## doesn't matter with one-sample
              conf.level=0.95)

The output of the t.test function:

## 
##  One Sample t-test
## 
## data:  x_obs
## t = 2.1587, df = 9, p-value = 0.0592
## alternative hypothesis: true mean is not equal to 10
## 95 percent confidence interval:
##   9.932058 12.902430
## sample estimates:
## mean of x 
##  11.41724

Question

Suppose that the firing rate of a neuron is measured during peripheral stimulation 100 times and a t-test is conducted to assess whether or not the true firing rate is greater than 10. Let \(X\) be the random variable that generates firing rates for this neuron. We do not know the true variance of \(X\). The raw data is stored in a variable named x_obs. Which of the following correctly computes the observed t-statistic?

t_obs <- (mean(x_obs) - 10) / (sd(x_obs) / sqrt(100))
t_obs <- (mean(x_obs) - 10) / (sd(x_obs) / sqrt(99))
t_obs <- (x_obs - 10) / sd(x_obs)

Click here for the answer

t_obs <- (mean(x_obs) - 10) / (sd(x_obs) / sqrt(100))

Question

The result of your t-test in the previous question is:

## 
##  One Sample t-test
## 
## data:  x_obs
## t = 1.9052, df = 9, p-value = 0.04457
## alternative hypothesis: true mean is greater than 10
## 95 percent confidence interval:
##  10.04347      Inf
## sample estimates:
## mean of x 
##  11.14925

What is the area of the shaded region in the plot below?

Click here for the answer

0.0445736

Question

Which of the following is false?

\[ \begin{align} &1. t_{obs} = \frac{\overline{x}_{obs} - \mu_{X_{H_0}}}{\widehat{\sigma}_X / \sqrt{n}} \\ &2. t_{obs} = \frac{\overline{x}_{obs} - \mu_{\overline{X}_{H_0}}}{\widehat{\sigma}_X / \sqrt{n}} \\ &3. t_{obs} = \frac{\overline{x}_{obs} - \mu_{\overline{X}_{H_0}}}{\widehat{\sigma}_{\overline{X}} / \sqrt{n}} \\ &4. t_{obs} = \frac{\overline{x}_{obs} - \mu_{\overline{X}_{H_0}}}{\widehat{\sigma}_{\overline{X}}} \\ \end{align} \]

Click here for the answer

Option 3 is false.