14 Central limit theorem
14.1 Learning objectives
- Understand the central limit theorem and the distribution of sample means and how it depends on sample size.
14.2 Central limit theorem
The central limit theorem tells us that the sum of independent and identically distributed random variables approximates a Normal distribution.
Let
\(\boldsymbol{Y} = \boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n\)
and
\(\boldsymbol{X}_i \sim \boldsymbol{D}\)
where \(\boldsymbol{D}\) can have any distribution whatsoever. Then,
\(\boldsymbol{Y} \sim \mathcal{N}(\mu_{Y}, \sigma_{y}^2)\)
14.2.1 Central limit theorem mean and variance
One way to see where the mean and variance parameters come from in the central limit theorem is to use the following rule:
Let
\(Y \sim a X + b\)
Then:
\(\mathbb{E}\big[Y\big] = a \mathbb{E}\big[X\big] + E(b)\)
\(\mathbb{Var}\big[Y\big] = (a^2) \mathbb{Var}\big[X\big]\)
14.2.2 Applied to the distribution of sample sums
\[\begin{align} \boldsymbol{Y} &= \boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n \\ \\ E(\boldsymbol{Y}) &= E(\boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n) \\ &= E(\boldsymbol{X}_1) + E(\boldsymbol{X}_2) + \dots + E(\boldsymbol{X}_n) \\ &= \mu_x + \mu_x + \ldots + \mu_x \\ &= n \mu_x \\ \\ Var(\boldsymbol{Y}) &= Var(\boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n) \\ &= Var(\boldsymbol{X}_1) + Var(\boldsymbol{X}_2) + \dots + Var(\boldsymbol{X}_n) \\ &= \sigma_x^2 + \sigma_x^2 + \dots + \sigma_x^2 \\ &= n \sigma_x^2 \\ \end{align}\]
\[\begin{align} \mu_{Y} &= n \mu_{X} \\ \sigma_Y^2 &= n \sigma_X^2 \\ \end{align}\]
14.2.3 Applied to the distribution of sample means
\[\begin{align} \boldsymbol{Y} &= \frac{1}{n} (\boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n) \\ \\ E(\boldsymbol{Y}) &= E(\frac{1}{n} (\boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n)) \\ &= \frac{1}{n} (E(\boldsymbol{X}_1) + E(\boldsymbol{X}_2) + \dots + E(\boldsymbol{X}_n)) \\ &= \frac{1}{n} (\mu_x + \mu_x + \ldots + \mu_x) \\ &= \mu_x \\ \\ Var(\boldsymbol{Y}) &= Var(\frac{1}{n} (\boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n)) \\ &= Var(\frac{1}{n^2} (Var(\boldsymbol{X}_1) + Var(\boldsymbol{X}_2) + \dots + Var(\boldsymbol{X}_n)) \\ &= \frac{1}{n^2} ( \sigma_x^2 + \sigma_x^2 + \dots + \sigma_x^2 ) \\ &= \frac{1}{n} \sigma_x^2 \\ \end{align}\]
\[\begin{align} \mu_{Y} &= \mu_{X} \\ \sigma_Y^2 &= \frac{1}{n} \sigma_X^2 \\ \end{align}\]
14.2.4 Standard Error
When applying the central limit theorem to the distribution of sample means, we get:
\[\begin{align} \mu_{Y} &= \mu_{X} \\ \sigma_Y^2 &= \frac{1}{n} \sigma_X^2 \\ \end{align}\]
When expressed in terms of standard deviation:
\[\begin{align} \mu_{Y} &= \mu_{X} \\ \sigma_y &= \frac{1}{\sqrt{n}} \sigma_X \\ \end{align}\]
Here, \(\sigma_Y\) is called the standard error of the mean (SEM) and it is very commonly used to draw error bars on various plots.