2024

Introduction to Confidence Intervals

  • Confidence intervals (CIs) provide a range of values that is likely to contain the population parameter of interest.

  • They estimate the precision of a sample statistic (e.g., the sample mean) as an estimator of a population parameter (e.g., the population mean).

  • If we were to take many samples and build a confidence interval from each sample, a certain percentage (e.g., 95%) of these intervals would contain the true population parameter.

How Confidence Intervals Are Calculated

  • \[\bar{x} \pm z \times \frac{s}{\sqrt{n}}\]
  • \(\bar{x}\) is the sample mean
  • \(z\) is the z-score from the standard normal distribution corresponding to the desired confidence level
  • \(s\) is the sample standard deviation
  • \(n\) is the sample size

An easier way to think of a confidence interval

  • In many cases, a confidence interval can be thought of as a shorthand description of the distribution of the test statistic (e.g., the distribution of sample means).

  • The observed value of the test statistic (e.g., sample mean) is the center of the interval and the distance between the critical values in a two-tailed test context is the width of the confidence interval.

Example: Calculating a 95% CI for a Mean

  • Let’s calculate a 95% confidence interval for the mean of a sample with the following characteristics:

  • Sample mean (\(\bar{x}\)): 100

  • Sample standard deviation (\(s\)): 15

  • Sample size (\(n\)): 30

Example: Calculating a 95% CI for a Mean

x_bar <- 100  # sample mean
s <- 15  # sample standard deviation
n <- 30  # sample size
z <- qnorm(0.975)  # z-score for 95% confidence

# Calculate the margin of error
margin_of_error <- z * (s / sqrt(n))

# Calculate the confidence interval
lower_bound <- x_bar - margin_of_error
upper_bound <- x_bar + margin_of_error

c(lower_bound, upper_bound)
## [1]  94.63242 105.36758

Visualising Confidence Intervals 1

Visualising Confidence Intervals 2

ChatGPT Wisdom

Confidence interval statement: If we were to take many samples and build a confidence interval from each sample, a certain percentage (e.g., 95%) of these intervals would contain the true population parameter.

But it isn’t true that given a CI there is a 95% chance that the true population parameter is contained within it.

No? Please explain.

When we say we have a 95% confidence interval (CI) for a population parameter (like the mean), this interval is based on the sample data we have collected. The CI gives us a range within which we are “95% confident” that the true population parameter lies, based on our sample. This confidence level is determined before we even collect our sample data, and it’s about the process used to estimate the interval, not the interval itself.

The correct interpretation of a 95% CI is this: If we were to take many samples from the population and calculate a 95% CI from each sample, we would expect 95% of those intervals to contain the true population parameter. It’s about the long-run frequency of these intervals capturing the true parameter if we repeated our sampling process many times under the same conditions.

However, for a specific interval we calculate from a single sample, it’s not accurate to say there’s a 95% probability that this particular interval contains the true population parameter. Once the interval is calculated, the true parameter is either within it or not; there’s no probability about it. This is because the true parameter is a fixed value, not a random variable, and the probability concept doesn’t apply in the way we might intuitively think.

The misconception often comes from interpreting the confidence interval in a Bayesian manner without intending to. In a Bayesian framework, it would make sense to talk about the probability of a parameter being within a certain range given the data (because Bayesian statistics allow for probability statements about parameters themselves). However, in the frequentist framework, which confidence intervals belong to, probabilities are related to the process of sampling and not to the parameters themselves.

In summary, the 95% confidence level refers to the procedure of how often the intervals, when constructed using the same method across numerous samples, would contain the true parameter, not to the probability of a single, specific interval capturing this parameter upon its computation.

One-tailed confidence intervals