Simple Linear Regression

2025

Simple Linear Regression

We have seen that correlation measures the strength and direction of a linear relationship between two variables.
Simple linear regression goes one step further: It models the relationship between two variables so we can predict the outcome variable \(Y\) from the predictor variable \(X\).
It states how much \(Y\) changes when \(X\) changes by a given amount.

Visualising correlation and regression

Regression model

For the \(i^\text{th}\) observation, we can write:

\[ y_{i} = \beta_{0} + \beta_{1} x_{i} + \epsilon_{i} \]

\(y_{i}\) is the \(i^\text{th}\) observed outcome.
\(x_{i}\) is the \(i^\text{th}\) value of the predictor variable.
\(\epsilon_{i}\) is called the residual and is the difference between the observed outcome and the predicted outcome.

\[ \epsilon_{i} \sim \mathscr{N}(0, \sigma_{\epsilon}) \]

\(\beta_{0}\) and \(\beta_{1}\) are population parameters of the linear regression model.

Regression model

Now lets extend to many observations:

\[ \begin{align} y_{1} &= \beta_{0} + \beta_{1} x_{1} + \epsilon_{1} \\ y_{2} &= \beta_{0} + \beta_{1} x_{2} + \epsilon_{2} \\ &\vdots \\ y_{n} &= \beta_{0} + \beta_{1} x_{n} + \epsilon_{n} \\ \end{align} \]

Regression model

We can gather the independent observations up into vectors:

\[ \begin{align} \begin{bmatrix} y_1\\ y_2\\ \vdots\\ y_n \end{bmatrix} &= \beta_{0} \begin{bmatrix} 1\\ 1\\ \vdots\\ 1 \end{bmatrix} + \beta_{1} \begin{bmatrix} x_1\\ x_2\\ \vdots\\ x_n\\ \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

Regression model

We can next gather the vectors up into a matrix:

\[ \begin{align} \begin{bmatrix} y_1\\ y_2\\ \vdots\\ y_n \end{bmatrix} &= \begin{bmatrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & x_n \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

Regression model

We can finally write the model in compact matrix form:

\[ \begin{align} \boldsymbol{y} &= \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \end{align} \]

\(\boldsymbol{y}\) is a vector of observed outcomes.
\(\boldsymbol{X}\) is a matrix of predictor variables and is called the design matrix
\(\boldsymbol{\beta}\) is a vector of \(\beta\) parameters.
\(\boldsymbol{\epsilon}\) is a vector of residuals.

How can we pick \(\boldsymbol{\beta}\) values that best fit our data?

let \(y_i\) denote observed values
let \(\widehat{y_{i}}\) denote predicted values:

\[ \widehat{y_{i}} = \beta_{0} + \beta_{1} x_{i} \]

The best fitting \(\boldsymbol{\beta}\) values are those that minimise the discrepancy between \(y_{i}\) and \(\widehat{y_{i}}\).

\[ \DeclareMathOperator*{\argmin}{\arg\!\min} \argmin_{\boldsymbol{\beta}} \sum_{i=1}^{n} (y_{i} - \widehat{y_{i}})^2 \]

\[ \DeclareMathOperator*{\argmin}{\arg\!\min} \argmin_{\boldsymbol{\beta}} \sum_{i=1}^{n} (y_{i} - (\beta_{0} + \beta_{1} x_{i}))^2 \]

The \(\boldsymbol{\beta}\) values that minimise error can be solved for analytically.
The method is to take the derivative with respect to \(\boldsymbol{\beta}\), and then find the \(\boldsymbol{\beta}\) values that make the resulting expression equal to zero.
I won’t do this here and won’t require you to do so either.
You should know however that this method of finding \(\beta\) values is called ordinary least squares.

Example regression \(\widehat{\beta}\) estimates

Testing \(\beta\) values for significance

We can use the best fitting \(\widehat{\beta}\) values to ansswer questions about the population \(\beta\) parameters (e.g., is either significantly different from zero?).
The raw data in a regression analysis are random samples \(\boldsymbol{x}\) and \(\boldsymbol{y}\) from random variables \(X\) and \(Y\).
Therefore the \(\widehat{\beta}\) values are also random variables.
It turns out the best fitting \(\beta\) values (i.e., \(\hat{\beta}\)) can be tested with a \(t\)-test.

1. NHST for \(\beta\) values

We test each regression coefficient \(\beta_0\) and \(\beta_1\) to see if it is significantly different from 0.

\[ \begin{align} H_0: & \ \beta_0 = 0 \\ H_1: & \ \beta_0 \neq 0 \end{align} \]

\[ \begin{align} H_0: & \ \beta_1 = 0 \\ H_1: & \ \beta_1 \neq 0 \end{align} \]

We are usually most interested in \(\beta_1\) because it tells us how much \(Y\) changes when \(X\) changes by a given amount, so we focus on that moving forward.

2. Choose a significance level

We pick a Type I error rate:

\[ \alpha = 0.05 \]

3. Sampling distribution of \(\widehat{\beta}\)

We obtain \(\widehat{\beta}_1\) using ordinary least squares (OLS) and describe its sampling distribution:

\[ \begin{align} \widehat{\beta}_1 &\sim \mathscr{N}(\beta_1, \sigma_{\widehat{\beta}_1}) \\ \widehat{\beta}_1 &\sim \mathscr{N}(0, \sigma_{\widehat{\beta}_1}) \quad \text{if } H_0 \text{ is true} \end{align} \]

To test the null hypothesis, we compute a \(t\)-statistic:

\[ t = \frac{\widehat{\beta}_1 - 0}{\sigma_{\widehat{\beta}_1}} \sim t(n - 2) \]

Regression model terms

\(SS_{error} = SS_{residual} = \sum_{i=1}^{n} (y_{i} - \hat{y_{i}})^2\)
\(SS_{error}\) is what you get when you compare raw observations against the full model predictions.
\(SS_{total} = \sum_{i=1}^{n} (y_{i} - \bar{y_{i}})^2\)
\(SS_{total}\) is what you get when you compare raw observations against the grand mean.

\(SS_{error}\) comes from \(\sum_{i=1}^{n} (y_{i} - \hat{y_{i}})^2\) with \(\hat{y} = \beta_{0} + \beta_{1} x + \epsilon\),
\(SS_{total}\) comes from \(\sum_{i=1}^{n} (y_{i} - \hat{y_{i}})^2\) with \(\hat{y} = \bar{y} + \epsilon\)
\(SS_{model} = \sum_{i=1}^{n} (\bar{y} - \hat{y_i})^2\) tells you how much the added complexity of the full model reduces the overall variability (i.e., makes better predictions).

The percent of variability accounted for above the simple model is given by:

\[R^2 = \frac{SS_{model}}{SS_{total}}\]

\(R^2\) is called the coefficient of determination and is just the sqaure of the correlation coefficient between \(x\) and \(y\).
The \(F\) ratio tells us whether or not the more complex model provides a significantly better fit to the data than the simplest model

\[ F = \frac{MS_{model}}{MS_{error}} \]

The regression \(F\)-ratio tells us how much the regression model has improved the prediction over the simple mean model relative to the overall inaccuracy in the regression model.