21 Independent samples t-test

Thus far we have learned how to test hypotheses that ask if a sample of data were likely generated by a random variable with a mean of some given value. For example, given sample \(x_1 \ldots x_n\) from a random variable \(X\), do you believe that \(\mu_X > c\) where \(c\) is some number. This situation will occur in your research career numerous times, so learning how to do this will easily prove to be a wise investment. However, in many circumstances, we will want to test hypotheses that ask if two samples were likely obtained by different random variables or not. That is, we will want to compare two treatments. The good news is that the fundamental logic of NHST that we have already established still works perfectly in this scenario.

As an example, lets return to the criterion learning data we have discussed numerous times in lecture and homework.

library(data.table)
library(ggplot2)
rm(list = ls())

d <- fread(
  'https://crossley.github.io/book_stats/data/criterion_learning/crit_learn.csv')

d <- d[cnd %in% c('Delay', 'Short ITI', 'Long ITI')]

dd <- d[order(cnd, sub), mean(t2c), .(cnd, sub)]

ggplot(dd, aes(cnd, V1)) +
  geom_violin() +
  ylab('Mean Trials to Criterion')

Consider two random variables \(X \sim \mathcal{N}(\mu_X, \sigma_X)\) and \(Y \sim \mathcal{N}(\mu_Y, \sigma_Y)\). Let \(X\) generate data for the Delay condition and \(Y\) generate data for the Long ITI condition. Furthermore, lets assume that \(X\) and \(Y\) are independent. This means that given a sample from \(X\), you don’t change your beliefs about \(Y\) in any way, and vice versa.

Is it realistic to assume that \(X\) and \(Y\) are independent? Since different subjects are in each condition (i.e., this is a between-subjects design), we can assume independence. We’d need to rethink our assumptions if the same subjects were involved in sampling from both \(X\) and \(Y\).

This particular experiment was designed to answer a simple question: Does a feedback delay cause the mean trials to criterion to increase relative to a control condition where feedback is given immediately and a delay is put in the inter-trial interval.

1. Specify the null and alternative hypotheses (\(H_0\) and \(H_1\)) in terms of a distribution and population parameter. \[ H_0: \mu_X = \mu_Y \\ H_1: \mu_X > \mu_Y \]

2. Specify the type I error rate – denoted by the symbol \(\alpha\) – you are willing to tolerate.

\[\alpha = 0.05\]

3. Specify the sample statistic that you will use to estimate the population parameter in step 1 and state how it is distributed under the assumption that \(H_0\) is true.

Hmm. Here things seem to get a bit tricky. How do we estimate \(\mu_X = \mu_Y\)? The first step is to rewrite our hypotheses such that all parameters are on the same side of the equation.

\[ H_0: \mu_X - \mu_Y = 0 \\ H_1: \mu_X - \mu_Y > 0 \]

The next step is define a third random variable \(W = X - Y\). With this definition, we know:

\[ W \sim \mathcal{N}(\mu_W, \sigma_W) \\ \mu_W = \mu_X - \mu_Y \\ \sigma^2_W = \sigma^2_X + \sigma^2_Y \]

Now we can rewrite our hypotheses as:

\[ H_0: \mu_W = 0 \\ H_1: \mu_W > 0 \]

Finally, we can actually get to business as usual for step 3:

\[ \begin{align} \widehat{\mu_W} &= \overline{W} \sim \mathcal{N}(\mu_\overline{W}, \sigma_\overline{W}) \\\\ \mu_\overline{W} &= \mu_\overline{X} - \mu_\overline{Y} \\ &= \mu_X - \mu_Y \\\\ \sigma^2_\overline{W} &= \sigma^2_\overline{X} + \sigma^2_\overline{Y} \\ &= \frac{\sigma^2_X}{n} + \frac{\sigma^2_Y}{n} \end{align} \]

We know that \(\sigma^2_W = \sigma^2_X + \sigma^2_Y\), so if we somehow now \(\sigma^2_X\) and \(\sigma^2_Y\), then we can just plug those in a proceed with Normal distributions. Much more commonly, we will not know \(\sigma^2_X\) and \(\sigma^2_Y\), so we will have to estimate them, and because of this, \(\hat{\mu_W}\) will have a \(t\) distribution, not a Normal distribution.

To figure out exactly how this t-distribution is configured, we need to figure out (1) how to estimate \(\sigma_\overline{W}\), and (2) the degrees of freedom of the resulting \(t\) distribution.

3. Specify the sample statistic that you will use to estimate the population parameter in step 1 and state how it is distributed under the assumption that \(H_0\) is true.

21.0.1 Equal sample size and equal variance

\[ \begin{align} s^2_\overline{W} &= s^2_\overline{X} + s^2_\overline{Y} \\ &= \frac{s^2_X}{n_X} + \frac{s^2_Y}{n_Y} \\ &= \frac{s^2_X}{n} + \frac{s^2_Y}{n} \\ &= \frac{s^2_X + s^2_Y}{n} \\\\ s_\overline{W} &= \sqrt{\frac{s^2_X + s^2_Y}{n}} \end{align} \]

Here, it is common to define a variable called the pooled variance denoted by \(s_p\) and defined by:

\[ s^2_p = \frac{S^2_X + S^2_Y}{2} \]

That is, in this case, the pooled variance is just the average variance over each component. In any case, we can define our test statistic and sampling distribution as follows:

\[ \begin{align} t_{obs} &= \frac{\overline{W} - \mu_\overline{W}}{s_\overline{W}} \\ &= \frac{\overline{X} - \overline{Y}}{\sqrt{\frac{s^2_X + s^2_Y}{n}}} \\\\ t_{obs} &\sim t(df) \\\\ df &= (n_X-1) + (n_Y-1) \\ &= 2n - 2 \end{align} \]

Alternatively, you can write this using the pooled variance as follows:

\[ \begin{align} t_{obs} &= \frac{\overline{W} - \mu_\overline{W}}{s_\overline{W}} \\ &= \frac{\overline{X} - \overline{Y}}{s_p \sqrt{\frac{2}{n}}} \\\\ t_{obs} &\sim t(df) \\\\ df &= (n_X-1) + (n_Y-1) \\ &= 2n - 2 \end{align} \]

21.0.2 Unequal sample sizes and equal variances

With unequal sample sizes, the pooled variance is weighted more towards the sample with the larger size:

\[ s^2_p = \frac{(n_X-1)s^2_X + (n_Y-1)s^2_Y}{n_X + n_Y - 2} \]

The rest works out almost the same as above:

\[ \begin{align} t_{obs} &= \frac{\overline{W} - \mu_\overline{W}}{s_\overline{W}} \\ &= \frac{\overline{X} - \overline{Y}}{s_p \sqrt{\frac{1}{n_X} + \frac{1}{n_Y}}} \\\\ t_{obs} &\sim t(df) \\\\ df &= (n_X-1) + (n_Y-1) \\ &= n_X + n_Y - 2 \end{align} \]

21.0.3 Unequal sample size and unequal variance

NOTE: This works for equal sample size as well.

It turns out that the expression for our observed t-value and estimated variance is quite simple, but the degrees of freedom for the test looks a bit more complicated.

\[ \begin{align} t_{obs} &= \frac{\overline{X} - \overline{Y}}{\sqrt{\frac{s^2_X}{n_X} + \frac{s^2_Y}{n_Y}}}\\\\ df &= \frac{\left( \frac{s^2_X}{n_X} + \frac{s^2_Y}{n_Y} \right)^2} {\frac{\left(\frac{s^2_X}{n_X} \right)^2}{n_X-1} + \frac{\left( \frac{s^2_Y}{n_Y} \right)^2}{n_Y-1}} \end{align} \]

This expression for \(df\) is known as the Welch–Satterthwaite equation.

4. Obtain a random sample and use it to compute the sample statistic from step 3. Call this value \(\widehat{\theta}_{\text{obs}}\)

x <- dd[cnd=='Delay', V1]
y <- dd[cnd=='Long ITI', V1]

nx <- length(x)
ny <- length(y)

xbar <- mean(x)
ybar <- mean(y)

varx <- var(x)
vary <- var(y)

## we have almost equal n, and lets assume equal variance
s <- sqrt((varx + vary)/nx)

tobs <- (xbar - ybar) / s

df <- nx + ny - 2

5. If \(\widehat{\theta}_{\text{obs}}\) is very unlikely to occur under the assumption that \(H_0\) is true, then reject \(H_0\). Otherwise, do not reject \(H_0\).

pval <- pt(tobs, df, lower.tail=F)

if(pval < 0.05) {
  print('reject the null')
  print(tobs)
  print(df)
  print(pval)
} else {
  print('fail to reject the null')
  print(tobs)
  print(df)
  print(pval)
}
## [1] "fail to reject the null"
## [1] 0.876303
## [1] 39
## [1] 0.1931156
## Check our work using built in t.test()
t.test(x=x,
       y=y,
       alternative='greater',
       mu=0,
       paired=FALSE,
       var.equal=TRUE,
       conf.leve=0.95)
## 
##  Two Sample t-test
## 
## data:  x and y
## t = 0.88703, df = 39, p-value = 0.1902
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -30.40335       Inf
## sample estimates:
## mean of x mean of y 
##  140.9593  107.1572

Our results are very close to the results from t.test() but not exact.

## Don't assume equal sample sizes, but still assume equal variance
s <- sqrt( (((nx-1)*varx + (ny-1)*vary) / (nx + ny - 2)) * (1/nx + 1/ny) )
tobs <- (xbar - ybar) / s
df <- nx + ny - 2
pval <- pt(tobs, df, lower.tail=F)

if(pval < 0.05) {
  print('reject the null')
  print(tobs)
  print(df)
  print(pval)
} else {
  print('fail to reject the null')
  print(tobs)
  print(df)
  print(pval)
}
## [1] "fail to reject the null"
## [1] 0.8870323
## [1] 39
## [1] 0.1902496
t.test(x=x,
       y=y,
       alternative='greater',
       mu=0,
       paired=FALSE,
       var.equal=TRUE,
       conf.leve=0.95)
## 
##  Two Sample t-test
## 
## data:  x and y
## t = 0.88703, df = 39, p-value = 0.1902
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
##  -30.40335       Inf
## sample estimates:
## mean of x mean of y 
##  140.9593  107.1572
# df <- (varx/nx +vary/ny)^2 / ( (varx/nx)^2/(nx-1) + (vary/ny)^2/(ny-1))

Hooray! Now everything matches exactly!