2025

Comparing two means

  • Thus far, we have considered only hypotheses of the form:

\[ \begin{align} H_0: &\mu_X = 10 \\ H_1: &\mu_X > 10 \\ \end{align} \]

  • In practice, we may wish to compare the means of two random variables, \(X\) and \(Y\).

\[ \begin{align} H_0: &\mu_X = \mu_Y \\ H_1: &\mu_X > \mu_Y \\ \end{align} \]

Independent samples t-test

  • The independent samples t-test is used to compare the means of two random variabels that are independent of each other (i.e., samples come from different populations).

  • Let \(X\) and \(Y\) be two independent random variables with means \(\mu_X\) and \(\mu_Y\), respectively.

  • An independent samples t-test is used to test the following hypotheses:

\[ \begin{align} H_0: &\mu_X = \mu_Y \\ H_1: &\mu_X > \mu_Y \\ \end{align} \]

Independent samples t-test

\[ \begin{align} H_0: &\mu_X = \mu_Y \\ H_1: &\mu_X > \mu_Y \\ \end{align} \]

  • Our previously developed tools don’t work well with H’s of this form so we reform our H’s as follows:

\[ \begin{align} H_0: &\mu_X - \mu_Y = 0 \\ H_1: &\mu_X - \mu_Y > 0 \\ \end{align} \]

  • The parameter of interest is the difference between the means of \(X\) and \(Y\).

\[ \begin{align} \widehat{\mu_X - \mu_Y} &= \bar{x} - \bar{y} \end{align} \]

Ind samp t-test: test statistic

  • The exact form of the test statistic depends on whether we assume equal variances between \(X\) and \(Y\) and whether or not the sample sizes are equal.

Ind samp t-test: test statistic

\[ \begin{align} t_{obs} &= \frac{(\bar{x} - \bar{y}) - (\mu_{X} - \mu_{Y})_{H_0}}{s_{\bar{X}-\bar{Y}}} \\ &= \frac{(\bar{x} - \bar{y}) - 0}{s_{\bar{X}-\bar{Y}}} \\ &= \frac{(\bar{x} - \bar{y}) - 0}{\sqrt{s^2_{\bar{X}-\bar{Y}}}} \\ &= \frac{(\bar{x} - \bar{y})}{\sqrt{s^2_{\bar{X}} + s^2_{\bar{Y}}}} \\ &= \frac{(\bar{x} - \bar{y})}{\sqrt{\frac{s_X^2}{n_x} + \frac{s_Y^2}{n_y}}} \end{align} \]

Ind samp t-test: test statistic

  • Welch’s t-test does not assume equal variance between \(X\) and \(Y\) and does not require equal sample sizes.

  • Welch’s t-test has a complex expression for the degrees of freedom.

\[ \begin{align} t_{obs} &= \frac{(\bar{x} - \bar{y})}{\sqrt{\frac{s_X^2}{n_x} + \frac{s_Y^2}{n_y}}} \sim t(df) \\ \\ df &= \frac{\left(\frac{s_X^2}{n_x} + \frac{s_Y^2}{n_y}\right)^2} {\frac{\left(\frac{s_X^2}{n_x}\right)^2}{n_x - 1} + \frac{\left(\frac{s_Y^2}{n_y}\right)^2}{n_y - 1}} \end{align} \]

Test statistic assuming equal variance

  • If we do assume equal variances, then we use a pooled variance estimate instead of keeping the individual samples separate.

\[ \begin{align} s_{pooled}^2 &= \frac{(n_x - 1)s_X^2 + (n_y - 1)s_Y^2}{n_x + n_y - 2} \\\\ t_{obs} &= \frac{(\bar{x} - \bar{y})}{\sqrt{s_{pooled}^2\left(\frac{1}{n_x} + \frac{1}{n_y}\right)}} \\ &= \frac{(\bar{x} - \bar{y})}{s_{pooled}\sqrt{\left(\frac{1}{n_x} + \frac{1}{n_y}\right)}} \\ \end{align} \]

Test statistic assuming equal variance

  • When assuming equal variances, the degrees of freedom is simplified.

\[ \begin{align} t_{obs} &= \frac{(\bar{x} - \bar{y})}{s_{pooled}\sqrt{\left(\frac{1}{n_x} + \frac{1}{n_y}\right)}} \sim t(df) \\\\ df &= n_x + n_y - 2 \\ \end{align} \]

Test statistic with equal sample sizes

  • When \(n_x = n_y\), we can simplify the previous equations by replacing both with \(n\). We won’t write out these simplified forms here, since the key ideas have already been explained and we’ll be using R to handle the actual \(t\)-test calculations most of the time.

Independent samples t-test in R

  • Test the hypothesis that the means of two random variables are equal.
x_obs <- rnorm(10, mean=10, sd=2)
y_obs <- rnorm(10, mean=12, sd=2)
res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="two.sided",
              mu=0,
              paired=F, # important to set this to False
              var.equal=T, # This flips between the two s_pooled caluclations
              conf.level=0.95)

The output of the t.test function:

## 
##  Two Sample t-test
## 
## data:  x_obs and y_obs
## t = -2.1847, df = 18, p-value = 0.04238
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.01352208 -0.07843591
## sample estimates:
## mean of x mean of y 
##  9.519335 11.565314

Repeated measures t-test

  • The repeated measures t-test is used to compare the means of two random variables that are drawn from the same population (e.g., samples come from the same subjects).

  • Let \(X\) and \(Y\) be two random variables with means \(\mu_X\) and \(\mu_Y\), respectively.

  • An repeated measures t-test is used to test the following hypotheses:

\[ \begin{align} H_0: &\ \mu_X = \mu_Y \\ H_1: &\ \mu_X > \mu_Y \\ \end{align} \]

  • Since these are the same subjects, we can reform our H’s as follows:

\[ \begin{align} H_0: & \mu_D = 0 \\ H_1: & \mu_D > 0 \\ \end{align} \]

  • \(D\) is the random variable that generates difference scores between \(X_i\) and \(Y_i\) where \(i\) indexes the subjects.

What are difference scores?

# suppose you observe the following data
x_obs <- rnorm(10, mean=10, sd=2)
y_obs <- rnorm(10, mean=12, sd=2)

# difference scores are the difference between the two,
# computed for each subject
d_obs <- x_obs - y_obs
  • Once you get the difference scores everything proceeds identically to a single-sample t-test.

Repeated measures t-test in R

res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="two.sided",
              mu=0,
              paired=T, ## this is important to set to True
              var.equal=T, ## this doesn't matter when paired=T
              conf.level=0.95)

The output of the t.test function:

## 
##  Paired t-test
## 
## data:  x_obs and y_obs
## t = -1.0381, df = 9, p-value = 0.3263
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -2.728877  1.012162
## sample estimates:
## mean difference 
##      -0.8583576

Question

Consider the following samples drawn from two distinct groups of subjects:

Please use t.test to test the hypothesis that the means of the two groups are equal. Please assume that the data is stored in the variables x_obs and y_obs.

Click here for the answer
res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="two.sided",
              mu=0,
              paired=F,
              var.equal=T,
              conf.level=0.95)

Question

Consider the following two probability distributions corresponding to the random variables \(X\) and \(Y\).

Assuming you obtain two samples x_obs and y_obs from these two RVs, please use t.test to test the following H’s:

\[ H_0: \mu_X = \mu_Y \\ H_1: \mu_X > \mu_Y \]

Click here for the answer
res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="greater",
              mu=0,
              paired=F,
              var.equal=F,
              conf.level=0.95)

Question

In the previous example, if you fail to reject the null, which of the following statements is true?

  • You have made a Type I error.
  • You have made a Type II error.
  • You have made the correct decision.
Click here for the answer You have made a type II error.