The Central Limit Theorem

2024

Introduction

The Central Limit Theorem (CLT) is a fundamental theorem in statistics that describes the distribution of sample means.
It explains why many distributions tend to be close to a normal distribution, especially as the sample size increases.
The CLT is crucial for conducting hypothesis tests and constructing confidence intervals in many practical situations.

Understanding the CLT

Definition: The CLT states that the distribution of the sample means of a large number of independent, identically distributed variables will be approximately normal, regardless of the shape of the original distribution.
Assumptions:
- The samples are independent.
- The samples are identically distributed, meaning each sample is drawn from the same distribution.
- The sample size is sufficiently large (usually n > 30 is considered large enough).

Demonstration of the CLT 1

Demonstration of the CLT 2

Significance of the CLT

Importantance: The CLT is important because it justifies the practice of assuming a normal distribution for inferential statistics, even when the population distribution is unknown.
Applications:
- It is used in hypothesis testing to determine if there is a significant difference between groups.
- In constructing confidence intervals for estimating population parameters.
- Simplifying complex probability problems into manageable calculations using the normal distribution.

The Math Behind the CLT

Mathematical Formulation: If \(X_1, X_2, ..., X_n\) are \(n\) independent, identically distributed (i.i.d.) random variables with mean \(\mu\) and variance \(\sigma^2\), then the sample mean \(\bar{X} = \frac{1}{n}\sum_{i=1}^{n}X_i\) will have a distribution that approaches a normal distribution \(N(\mu, \frac{\sigma^2}{n})\) as \(n\) approaches infinity.
Simplified Explanation: No matter what the original distribution looks like, the distribution of the sample means will tend to look more like a normal distribution as the number of samples increases. This holds true even for non-normal distributions.

Central limit theorem mean and variance

One way to see where the mean and variance parameters come from in the central limit theorem is to use the following rule:

\[ \begin{align} \text{Let} Y &\sim a X + b \\ \mathbb{E}\big[Y\big] &= a \mathbb{E}\big[X\big] + \mathbb{E}(b) \\ \mathbb{Var}\big[Y\big] &= (a^2) \mathbb{Var}\big[X\big] \\ \end{align} \]

Applied to the distribution of sample sums

\[ \begin{align} \boldsymbol{Y} &= \boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n \\ \\ E(\boldsymbol{Y}) &= E(\boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n) \\ &= E(\boldsymbol{X}_1) + E(\boldsymbol{X}_2) + \dots + E(\boldsymbol{X}_n) \\ &= \mu_x + \mu_x + \ldots + \mu_x \\ &= n \mu_x \\ \end{align} \]

\[ \begin{align} Var(\boldsymbol{Y}) &= Var(\boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n) \\ &= Var(\boldsymbol{X}_1) + Var(\boldsymbol{X}_2) + \dots + Var(\boldsymbol{X}_n) \\ &= \sigma_x^2 + \sigma_x^2 + \dots + \sigma_x^2 \\ &= n \sigma_x^2 \\ \end{align} \]

\[ \begin{align} \mu_{Y} &= n \mu_{X} \\ \sigma_Y^2 &= n \sigma_X^2 \\ \end{align} \]

Applied to the distribution of sample means

\[ \begin{align} \boldsymbol{Y} &= \frac{1}{n} (\boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n) \\ \\ E(\boldsymbol{Y}) &= E(\frac{1}{n} (\boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n)) \\ &= \frac{1}{n} (E(\boldsymbol{X}_1) + E(\boldsymbol{X}_2) + \dots + E(\boldsymbol{X}_n)) \\ &= \frac{1}{n} (\mu_x + \mu_x + \ldots + \mu_x) \\ &= \mu_x \\ \end{align} \]

\[ \begin{align} Var(\boldsymbol{Y}) &= Var(\frac{1}{n} (\boldsymbol{X}_1 + \boldsymbol{X}_2 + \dots + \boldsymbol{X}_n)) \\ &= Var(\frac{1}{n^2} (Var(\boldsymbol{X}_1) + Var(\boldsymbol{X}_2) + \dots + Var(\boldsymbol{X}_n)) \\ &= \frac{1}{n^2} ( \sigma_x^2 + \sigma_x^2 + \dots + \sigma_x^2 ) \\ &= \frac{1}{n} \sigma_x^2 \\ \end{align} \]

\[ \begin{align} \mu_{Y} &= \mu_{X} \\ \sigma_Y^2 &= \frac{1}{n} \sigma_X^2 \\ \end{align} \]

Standard Error

When applying the central limit theorem to the distribution of sample means, we get:

\[ \begin{align} \mu_{Y} &= \mu_{X} \\ \sigma_Y^2 &= \frac{1}{n} \sigma_X^2 \\ \end{align} \]

When expressed in terms of standard deviation:

\[ \begin{align} \mu_{Y} &= \mu_{X} \\ \sigma_y &= \frac{1}{\sqrt{n}} \sigma_X \\ \end{align} \]

Here, \(\sigma_Y\) is called the standard error of the mean (SEM) and it is very commonly used to draw error bars on various plots.

Question

What does the Central Limit Theorem (CLT) state about the distribution of sample means?

The distribution of sample means will approach a uniform distribution as the sample size increases.
The distribution of sample means will be skewed towards the left or right depending on the population distribution.
The distribution of sample means will approach a normal distribution as the sample size increases, regardless of the population distribution.
The distribution of sample means is irrelevant to the shape of the population distribution.

Answer

Correct Answer: The Central Limit Theorem states that, given a sufficiently large sample size, the distribution of sample means will be approximately normally distributed, regardless of the shape of the population distribution.