2024

Comparing two means

  • Thus far, we have considered only hypotheses of the form:

\[ \begin{align} H_0: &\mu_X = 10 \\ H_1: &\mu_X > 10 \\ \end{align} \]

  • In practice, we may wish to compare the means of two random variables, \(X\) and \(Y\).

\[ \begin{align} H_0: &\mu_X = \mu_Y \\ H_1: &\mu_X > \mu_Y \\ \end{align} \]

Independent samples t-test

  • The independent samples t-test is used to compare the means of two random variabels that are independent of each other (i.e., samples come from different samples).

  • Let \(X\) and \(Y\) be two independent random variables with means \(\mu_X\) and \(\mu_Y\), respectively.

  • An independent samples t-test is used to test the following hypotheses:

\[ \begin{align} H_0: &\mu_X = \mu_Y \\ H_1: &\mu_X > \mu_Y \\ \end{align} \]

  • Our previously developed tools don’t work well with H’s of this form so we reform our H’s as follows:

\[ \begin{align} H_0: &\mu_X = \mu_Y \\ H_1: &\mu_X > \mu_Y \\ \end{align} \]

\[ \begin{align} H_0: &\mu_X - \mu_Y = 0 \\ H_1: &\mu_X - \mu_Y > 0 \\ \end{align} \]

  • Reformed H’s \(\Rightarrow\) reformed test statistic:

\[ \begin{align} \widehat{\mu_X - \mu_Y} &= \bar{X} - \bar{Y} \\ t_{obs} &= \frac{(\bar{x} - \bar{y}) - (\mu_{X_{H_0}} - \mu_{Y_{H_0}})}{\sqrt{s^2_{X-Y}}} \\ t_{obs} &= \frac{(\bar{x} - \bar{y})}{\sqrt{s^2_{X-Y}}} \end{align} \]

  • What is \(s^2_{X-Y}\)?

  • This is sometimes called the pooled variance and its exact form depends on the sample sizes and variances of \(X\) and \(Y\).

  • Equal variances:

  • Sample \(x_{obs}\) has \(n_x\) observations with a sample variance of \(s_X^2\).

  • Sample \(y_{obs}\) has \(n_y\) observations with a sample variance of \(s_Y^2\).

\[ \begin{align} s_{X - Y} &= s_{pooled} \\ s_{pooled}^2 &= \frac{(n_X - 1)s_X^2 + (n_Y - 1)s_Y^2}{n_X + n_Y - 2} \\ s_{pooled} &= \sqrt{s_{pooled}^2} \\ df &= 2n - 2 \end{align} \]

  • Unequal variances: Welch’s t-test

\[ \begin{align} t_{obs} &= \frac{(\bar{X} - \bar{Y})}{\sqrt{\frac{s_X^2}{n_1} + \frac{s_Y^2}{n_2}}} \\ &\sim t(df) \\ df &= \frac{\left(\frac{s_X^2}{n_1} + \frac{s_Y^2}{n_2}\right)^2}{\frac{\left(\frac{s_X^2}{n_1}\right)^2}{n_1 - 1} + \frac{\left(\frac{s_Y^2}{n_2}\right)^2}{n_2 - 1}} \end{align} \]

Independent samples t-test in R

  • Test the hypothesis that the means of two random variables are equal.
x_obs <- rnorm(10, mean=10, sd=2)
y_obs <- rnorm(10, mean=12, sd=2)
res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="two.sided",
              mu=0,
              paired=F,
              var.equal=T,
              conf.level=0.95)

  • The output of the t.test function:
## 
##  Two Sample t-test
## 
## data:  x_obs and y_obs
## t = -3.0474, df = 18, p-value = 0.006932
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -5.925155 -1.089295
## sample estimates:
## mean of x mean of y 
##  8.682736 12.189961

Repeated measures t-test

  • The repeated measures t-test is used to compare the means of two random variables that are drawn from the same population (i.e., samples come from the same subjects).

  • Let \(X\) and \(Y\) be two random variables with means \(\mu_X\) and \(\mu_Y\), respectively.

  • An repeated measures t-test is used to test the following hypotheses:

\[ \begin{align} H_0: &\ \mu_X = \mu_Y \\ H_1: &\ \mu_X > \mu_Y \\ \end{align} \]

  • Since these are the same subjects, we can reform our H’s as follows:

\[ \begin{align} H_0: & \mu_D = 0 \\ H_1: & \mu_D > 0 \\ \end{align} \]

  • \(D\) is the random variable that generates difference scores between \(X_i\) and \(Y_i\) where \(i\) indexes the subjects.

  • What are difference scores?
# suppose you observe the following data
x_obs <- rnorm(10, mean=10, sd=2)
y_obs <- rnorm(10, mean=12, sd=2)

# difference scores are the difference between the two,
# computed for each subject
d_obs <- x_obs - y_obs
  • Once you get the difference scores everything proceeds identically to a single-sample t-test.

Repeated measures t-test in R

res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="two.sided",
              mu=0,
              paired=T, ## this is the only difference
              var.equal=T, ## doesn't matter in paired=T
              conf.level=0.95)

  • The output of the t.test function:
## 
##  Paired t-test
## 
## data:  x_obs and y_obs
## t = -2.3365, df = 9, p-value = 0.04427
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -3.75655723 -0.06076919
## sample estimates:
## mean difference 
##       -1.908663

Question

Consider the following samples drawn from two distinct groups of subjects:

Please use t.test to test the hypothesis that the means of the two groups are equal. Please assume that the data is stored in the variables x_obs and y_obs.

Click here for the answer
res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="two.sided",
              mu=0,
              paired=F,
              var.equal=T,
              conf.level=0.95)

Question

Consider the following two probability distributions corresponding to the random variables \(X\) and \(Y\).

Assuming you obtain two samples x_obs and y_obs from these two RVs, please use t.test to test the following H’s:

\[ H_0: \mu_X = \mu_Y \\ H_1: \mu_X > \mu_Y \]

Click here for the answer
res <- t.test(x=x_obs, 
              y=y_obs, 
              alternative="greater",
              mu=0,
              paired=F,
              var.equal=F,
              conf.level=0.95)

Question

In the previous example, if you fail to reject the null, which of the following statements is true?

  • You have made a Type I error.
  • You have made a Type II error.
  • You have made the correct decision.
Click here for the answer You have made a type II error.