Two-sample t-tests

2025

Comparing two means

Thus far, we have considered only hypotheses of the form:

\[ \begin{align} H_0: &\mu_X = 10 \\ H_1: &\mu_X > 10 \\ \end{align} \]

In practice, we may wish to compare the means of two random variables, \(X\) and \(Y\).

\[ \begin{align} H_0: &\mu_X = \mu_Y \\ H_1: &\mu_X > \mu_Y \\ \end{align} \]

Independent samples t-test

The independent samples t-test is used to compare the means of two random variabels that are independent of each other (i.e., samples come from different populations).
Let \(X\) and \(Y\) be two independent random variables with means \(\mu_X\) and \(\mu_Y\), respectively.
An independent samples t-test is used to test the following hypotheses:

\[ \begin{align} H_0: &\mu_X = \mu_Y \\ H_1: &\mu_X > \mu_Y \\ \end{align} \]

Independent samples t-test

\[ \begin{align} H_0: &\mu_X = \mu_Y \\ H_1: &\mu_X > \mu_Y \\ \end{align} \]

Our previously developed tools don’t work well with H’s of this form so we reform our H’s as follows:

\[ \begin{align} H_0: &\mu_X - \mu_Y = 0 \\ H_1: &\mu_X - \mu_Y > 0 \\ \end{align} \]

The parameter of interest is the difference between the means of \(X\) and \(Y\).

\[ \begin{align} \widehat{\mu_X - \mu_Y} &= \bar{x} - \bar{y} \end{align} \]

Ind samp t-test: test statistic

The exact form of the test statistic depends on whether we assume equal variances between \(X\) and \(Y\) and whether or not the sample sizes are equal.

Ind samp t-test: test statistic

\[ \begin{align} t_{obs} &= \frac{(\bar{x} - \bar{y}) - (\mu_{X} - \mu_{Y})_{H_0}}{s_{\bar{X}-\bar{Y}}} \\ &= \frac{(\bar{x} - \bar{y}) - 0}{s_{\bar{X}-\bar{Y}}} \\ &= \frac{(\bar{x} - \bar{y}) - 0}{\sqrt{s^2_{\bar{X}-\bar{Y}}}} \\ &= \frac{(\bar{x} - \bar{y})}{\sqrt{s^2_{\bar{X}} + s^2_{\bar{Y}}}} \\ &= \frac{(\bar{x} - \bar{y})}{\sqrt{\frac{s_X^2}{n_x} + \frac{s_Y^2}{n_y}}} \end{align} \]

Ind samp t-test: test statistic

Welch’s t-test does not assume equal variance between \(X\) and \(Y\) and does not require equal sample sizes.
Welch’s t-test has a complex expression for the degrees of freedom.

\[ \begin{align} t_{obs} &= \frac{(\bar{x} - \bar{y})}{\sqrt{\frac{s_X^2}{n_x} + \frac{s_Y^2}{n_y}}} \sim t(df) \\ \\ df &= \frac{\left(\frac{s_X^2}{n_x} + \frac{s_Y^2}{n_y}\right)^2} {\frac{\left(\frac{s_X^2}{n_x}\right)^2}{n_x - 1} + \frac{\left(\frac{s_Y^2}{n_y}\right)^2}{n_y - 1}} \end{align} \]

Test statistic assuming equal variance

If we do assume equal variances, then we use a pooled variance estimate instead of keeping the individual samples separate.

\[ \begin{align} s_{pooled}^2 &= \frac{(n_x - 1)s_X^2 + (n_y - 1)s_Y^2}{n_x + n_y - 2} \\\\ t_{obs} &= \frac{(\bar{x} - \bar{y})}{\sqrt{s_{pooled}^2\left(\frac{1}{n_x} + \frac{1}{n_y}\right)}} \\ &= \frac{(\bar{x} - \bar{y})}{s_{pooled}\sqrt{\left(\frac{1}{n_x} + \frac{1}{n_y}\right)}} \\ \end{align} \]

Test statistic assuming equal variance

When assuming equal variances, the degrees of freedom is simplified.

\[ \begin{align} t_{obs} &= \frac{(\bar{x} - \bar{y})}{s_{pooled}\sqrt{\left(\frac{1}{n_x} + \frac{1}{n_y}\right)}} \sim t(df) \\\\ df &= n_x + n_y - 2 \\ \end{align} \]

Test statistic with equal sample sizes

When \(n_x = n_y\), we can simplify the previous equations by replacing both with \(n\). We won’t write out these simplified forms here, since the key ideas have already been explained and we’ll be using R to handle the actual \(t\)-test calculations most of the time.

Independent samples t-test in R

Test the hypothesis that the means of two random variables are equal.

x_obs <- rnorm(10, mean=10, sd=2)
y_obs <- rnorm(10, mean=12, sd=2)
res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="two.sided",
              mu=0,
              paired=F, # important to set this to False
              var.equal=T, # This flips between the two s_pooled caluclations
              conf.level=0.95)

The output of the `t.test` function:

## 
##  Two Sample t-test
## 
## data:  x_obs and y_obs
## t = -2.1847, df = 18, p-value = 0.04238
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.01352208 -0.07843591
## sample estimates:
## mean of x mean of y 
##  9.519335 11.565314

Repeated measures t-test

The repeated measures t-test is used to compare the means of two random variables that are drawn from the same population (e.g., samples come from the same subjects).
Let \(X\) and \(Y\) be two random variables with means \(\mu_X\) and \(\mu_Y\), respectively.
An repeated measures t-test is used to test the following hypotheses:

\[ \begin{align} H_0: &\ \mu_X = \mu_Y \\ H_1: &\ \mu_X > \mu_Y \\ \end{align} \]

Since these are the same subjects, we can reform our H’s as follows:

\[ \begin{align} H_0: & \mu_D = 0 \\ H_1: & \mu_D > 0 \\ \end{align} \]

\(D\) is the random variable that generates difference scores between \(X_i\) and \(Y_i\) where \(i\) indexes the subjects.

What are difference scores?

# suppose you observe the following data
x_obs <- rnorm(10, mean=10, sd=2)
y_obs <- rnorm(10, mean=12, sd=2)

# difference scores are the difference between the two,
# computed for each subject
d_obs <- x_obs - y_obs

Once you get the difference scores everything proceeds identically to a single-sample t-test.

Repeated measures t-test in R

res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="two.sided",
              mu=0,
              paired=T, ## this is important to set to True
              var.equal=T, ## this doesn't matter when paired=T
              conf.level=0.95)

The output of the `t.test` function:

## 
##  Paired t-test
## 
## data:  x_obs and y_obs
## t = -1.0381, df = 9, p-value = 0.3263
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -2.728877  1.012162
## sample estimates:
## mean difference 
##      -0.8583576

Question

Consider the following samples drawn from two distinct groups of subjects:

Please use t.test to test the hypothesis that the means of the two groups are equal. Please assume that the data is stored in the variables x_obs and y_obs.

Click here for the answer

res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="two.sided",
              mu=0,
              paired=F,
              var.equal=T,
              conf.level=0.95)

Question

Consider the following two probability distributions corresponding to the random variables \(X\) and \(Y\).

Assuming you obtain two samples x_obs and y_obs from these two RVs, please use t.test to test the following H’s:

\[ H_0: \mu_X = \mu_Y \\ H_1: \mu_X > \mu_Y \]

Click here for the answer

res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="greater",
              mu=0,
              paired=F,
              var.equal=F,
              conf.level=0.95)

Question

In the previous example, if you fail to reject the null, which of the following statements is true?

You have made a Type I error.
You have made a Type II error.
You have made the correct decision.

Click here for the answer

You have made a type II error.

Comparing two means

Independent samples t-test

Independent samples t-test

Ind samp t-test: test statistic

Ind samp t-test: test statistic

Ind samp t-test: test statistic

Test statistic assuming equal variance

Test statistic assuming equal variance

Test statistic with equal sample sizes

Independent samples t-test in R

The output of the t.test function:

Repeated measures t-test

What are difference scores?

Repeated measures t-test in R

The output of the t.test function:

Question

Question

Question

The output of the `t.test` function:

The output of the `t.test` function: