Simple Linear Regression: Model Comparison

2025

Regression as model comparison

By estimating \(\beta_0\) and \(\beta_1\), we can predict \(Y\) from \(X\).
By using NHST to test the null hypothesis that \(\beta_1 = 0\), we can determine whether \(X\) is a significant predictor of \(Y\).
But regression can also be viewed through the lens of model comparison.

Regression as model comparison

The regression \(F\)-test compares two models:
- A simple model that only predicts the mean of \(Y\)
- A full model that predicts \(Y\) using \(X\)

Regression as model comparison

\(SS_{\text{error}}\) captures variability unexplained by the full model

\[ \begin{align*} SS_{\text{error}} &= \sum_{i=1}^{n} (y_{i} - \widehat{y_{i}})^2 \\ &= \sum_{i=1}^{n} (y_{i} - (\beta_0 + \beta_1 x_i))^2 \\ df_{\text{error}} &= n - 2 \end{align*} \]

Two parameters (\(\beta_0\), \(\beta_1\)) are estimated from the data, so 2 degrees of freedom are used up.

Regression as model comparison

\(SS_{\text{total}}\) captures variability unexplained by the mean model

\[ \begin{align*} SS_{\text{total}} &= \sum_{i=1}^{n} (y_{i} - \bar{y})^2 \\ df_{\text{total}} &= n - 1 \end{align*} \]

One parameter (the mean \(\bar{y}\)) is estimated, so 1 degree of freedom is used.

Regression as model comparison

\(SS_{\text{model}}\) tells you how much the added complexity of the full model reduces the overall variability (i.e., makes better predictions).

\[ \begin{align*} SS_{\text{model}} &= \sum_{i=1}^{n} (\bar{y} - \widehat{y_i})^2 \\ df_{\text{model}} &= 1 \end{align*} \]

\(df_{\text{model}} = 1\) because: \(df_{\text{total}} = df_{\text{model}} + df_{\text{error}}\)

Regression as model comparison

\[ \begin{align*} F &= \frac{MS_{\text{model}}}{MS_{\text{error}}} \sim F(df_{\text{model}}, df_{\text{error}}) \\ &MS_{\text{model}} = \frac{SS_{\text{model}}}{df_{\text{model}}} \\ &MS_{\text{error}} = \frac{SS_{\text{error}}}{df_{\text{error}}} \end{align*} \]

This tests how much better is the model with \(X\), compared to just using the mean.

Simple linear regression in `R`

## 
## Call:
## lm(formula = y ~ x, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.07367 -0.28707  0.03987  0.36102  0.85589 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.93051    0.04446   43.42   <2e-16 ***
## x            0.43384    0.04060   10.69   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4444 on 98 degrees of freedom
## Multiple R-squared:  0.5382, Adjusted R-squared:  0.5335 
## F-statistic: 114.2 on 1 and 98 DF,  p-value: < 2.2e-16

Regression as model comparison

The percent of variability accounted for by the full model is:

\[ R^2 = \frac{SS_{\text{model}}}{SS_{\text{total}}} \]

\(R^2\) is called the coefficient of determination and is just the square of the correlation coefficient between \(x\) and \(y\).

Coefficient of determination vs correlation coefficient

Both are used to examine the relationship between variables.
In simple linear regression with one predictor, the square of the correlation coefficient \(r^2\) is equal to the coefficient of determination \(R^2\).
The sign of the regression coefficient (positive or negative) will always match the sign of the correlation coefficient.
Correlation provides a single metric describing the linear relationship between two variables.
Regression models the relationship and quantifies how much the dependent variable changes with the independent variable.

Regression vs correlation

In simple linear regression, \(R^2\) — the proportion of variance explained — is exactly equal to the square of the correlation coefficient:

\[ R^2 = \frac{SS_{\text{model}}}{SS_{\text{total}}} = r^2 \]

Multiple \(R^2\) vs Adjusted \(R^2\)

Multiple \(R^2\) (aka plain \(R^2\) in simple regression) tells you the proportion of total variance in \(Y\) explained by the model.
Adjusted \(R^2\) corrects for model complexity by penalizes adding too many predictors — especially useful in multiple regression.

Regression as model comparison

Regression as model comparison

Regression as model comparison

Regression as model comparison

Regression as model comparison

Regression as model comparison

Regression as model comparison

Simple linear regression in R

Regression as model comparison

Coefficient of determination vs correlation coefficient

Regression vs correlation

Multiple \(R^2\) vs Adjusted \(R^2\)

Simple linear regression in `R`