2025

Regression as model comparison

  • By estimating \(\beta_0\) and \(\beta_1\), we can predict \(Y\) from \(X\).

  • By using NHST to test the null hypothesis that \(\beta_1 = 0\), we can determine whether \(X\) is a significant predictor of \(Y\).

  • But regression can also be viewed through the lens of model comparison.

Regression as model comparison

  • The regression \(F\)-test compares two models:

    • A simple model that only predicts the mean of \(Y\)

    • A full model that predicts \(Y\) using \(X\)

Regression as model comparison

Regression as model comparison

  • \(SS_{\text{error}}\) captures variability unexplained by the full model

\[ \begin{align*} SS_{\text{error}} &= \sum_{i=1}^{n} (y_{i} - \widehat{y_{i}})^2 \\ &= \sum_{i=1}^{n} (y_{i} - (\beta_0 + \beta_1 x_i))^2 \\ df_{\text{error}} &= n - 2 \end{align*} \]

  • Two parameters (\(\beta_0\), \(\beta_1\)) are estimated from the data, so 2 degrees of freedom are used up.

Regression as model comparison

  • \(SS_{\text{total}}\) captures variability unexplained by the mean model

\[ \begin{align*} SS_{\text{total}} &= \sum_{i=1}^{n} (y_{i} - \bar{y})^2 \\ df_{\text{total}} &= n - 1 \end{align*} \]

  • One parameter (the mean \(\bar{y}\)) is estimated, so 1 degree of freedom is used.

Regression as model comparison

  • \(SS_{\text{model}}\) tells you how much the added complexity of the full model reduces the overall variability (i.e., makes better predictions).

\[ \begin{align*} SS_{\text{model}} &= \sum_{i=1}^{n} (\bar{y} - \widehat{y_i})^2 \\ df_{\text{model}} &= 1 \end{align*} \]

  • \(df_{\text{model}} = 1\) because: \(df_{\text{total}} = df_{\text{model}} + df_{\text{error}}\)

Regression as model comparison

\[ \begin{align*} F &= \frac{MS_{\text{model}}}{MS_{\text{error}}} \sim F(df_{\text{model}}, df_{\text{error}}) \\ &MS_{\text{model}} = \frac{SS_{\text{model}}}{df_{\text{model}}} \\ &MS_{\text{error}} = \frac{SS_{\text{error}}}{df_{\text{error}}} \end{align*} \]

  • This tests how much better is the model with \(X\), compared to just using the mean.

Simple linear regression in R

## 
## Call:
## lm(formula = y ~ x, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.07367 -0.28707  0.03987  0.36102  0.85589 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.93051    0.04446   43.42   <2e-16 ***
## x            0.43384    0.04060   10.69   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4444 on 98 degrees of freedom
## Multiple R-squared:  0.5382, Adjusted R-squared:  0.5335 
## F-statistic: 114.2 on 1 and 98 DF,  p-value: < 2.2e-16

Regression as model comparison

  • The percent of variability accounted for by the full model is:

\[ R^2 = \frac{SS_{\text{model}}}{SS_{\text{total}}} \]

  • \(R^2\) is called the coefficient of determination and is just the square of the correlation coefficient between \(x\) and \(y\).

Coefficient of determination vs correlation coefficient

  • Both are used to examine the relationship between variables.

  • In simple linear regression with one predictor, the square of the correlation coefficient \(r^2\) is equal to the coefficient of determination \(R^2\).

  • The sign of the regression coefficient (positive or negative) will always match the sign of the correlation coefficient.

  • Correlation provides a single metric describing the linear relationship between two variables.

  • Regression models the relationship and quantifies how much the dependent variable changes with the independent variable.

Regression vs correlation

  • In simple linear regression, \(R^2\) — the proportion of variance explained — is exactly equal to the square of the correlation coefficient:

\[ R^2 = \frac{SS_{\text{model}}}{SS_{\text{total}}} = r^2 \]

Multiple \(R^2\) vs Adjusted \(R^2\)

  • Multiple \(R^2\) (aka plain \(R^2\) in simple regression) tells you the proportion of total variance in \(Y\) explained by the model.

  • Adjusted \(R^2\) corrects for model complexity by penalizes adding too many predictors — especially useful in multiple regression.