Multiple Regression

2025

Multiple regression

Multiple regression works just like simple linear regression, but uses multiple predictor variables to predict the outcome variable.
For the \(i^{th}\) observation, we can write:

\[ \begin{align} y_{i} &= \beta_{0} + \beta_{1} x_{1_i} + \beta_{2} x_{2_i} + \dots + \beta_{k} x_{k_i} + \epsilon_{i} \\\\ \end{align} \]

Multiple regression

\(y_{i}\) is the \(i^{th}\) observed outcome.
\(x_{k_{i}}\) is the \(i^{th}\) value of the \(k^{th}\) predictor variable.
\(\beta_{0}\), \(\beta_{1}\), and \(\beta_{2}\) are parameters of the linear regression model.
\(\epsilon_{i}\) is called the residual and is the difference between the observed outcome and the predicted outcome.

\[ \begin{align} \epsilon_{i} &= y_{i} - \widehat{y_{i}} \\ \epsilon_{i} &= y_{i} - (\beta_{0} + \beta_{1} x_{1_i} + \beta_{2} x_{2_i} + \dots + \beta_{k} x_{k_i}) \\ \epsilon_{i} &\sim \mathscr{N}(0, \sigma_{\epsilon}) \\ \end{align} \]

Multiple regression

Now lets extend to many observations:

\[ \begin{align} y_{1} &= \beta_{0} + \beta_{1} x_{1_1} + \beta_{2} x_{2_1} + \dots + \beta_{k} x_{k_1} + \epsilon_{1} \\ y_{2} &= \beta_{0} + \beta_{1} x_{1_2} + \beta_{2} x_{2_2} + \dots + \beta_{k} x_{k_2} + \epsilon_{2} \\ &\vdots \\ y_{n} &= \beta_{0} + \beta_{1} x_{1_n} + \beta_{2} x_{2_n} + \dots + \beta_{k} x_{k_n} + \epsilon_{n} \\ \end{align} \]

Multiple regression

We can gather the independent observations up into vectors:

\[ \begin{align} \boldsymbol{y} &= \beta_{0} \begin{bmatrix} 1\\ 1\\ \vdots\\ 1 \end{bmatrix} + \beta_{1} \begin{bmatrix} x_{1_1}\\ x_{1_2}\\ \vdots\\ x_{1_n}\\ \end{bmatrix} + \beta_{2} \begin{bmatrix} x_{2_1}\\ x_{2_2}\\ \vdots\\ x_{2_n}\\ \end{bmatrix} + \ldots + \beta_{k} \begin{bmatrix} x_{k_1}\\ x_{k_2}\\ \vdots\\ x_{k_n}\\ \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

Multiple regression

We can next gather the vectors up into a matrix:

\[ \begin{align} \boldsymbol{y} &= \begin{bmatrix} 1 & x_{1_1} & x_{2_1} & \ldots & x_{k_1} \\ 1 & x_{1_2} & x_{2_2} & \ldots & x_{k_2} \\ \vdots & \vdots & \dots & \dots \\ 1 & x_{1_n} & x_{2_n} & \ldots & x_{k_n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{k} \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

Multiple regression

We can finally write the model in compact matrix form:

\[ \begin{align} \boldsymbol{y} &= \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \\ \end{align} \]

\(\boldsymbol{y}\) is a vector of observed outcomes.
\(\boldsymbol{X}\) is a matrix of predictor variables and is called the design matrix
\(\boldsymbol{\beta}\) is a vector of \(\beta\) parameters.
\(\boldsymbol{\epsilon}\) is a vector of residuals.

How can we pick \(\boldsymbol{\beta}\) values that best fit our data?

let \(y_i\) denote observed values
let \(\widehat{y_{i}}\) denote predicted values:

\[ \widehat{y_{i}} = \beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i} + \ldots + \beta_{k} x_{i} \]

How can we pick \(\boldsymbol{\beta}\) values that best fit our data?

The best fitting \(\boldsymbol{\beta}\) values are those that minimise the discrepancy between \(y_{i}\) and \(\widehat{y_{i}}\).

\[ \DeclareMathOperator*{\argmin}{\arg\!\min} \argmin_{\boldsymbol{\beta}} \sum_{i=1}^{n} (y_{i} - \widehat{y_{i}})^2 \]

\[ \DeclareMathOperator*{\argmin}{\arg\!\min} \argmin_{\boldsymbol{\beta}} \sum_{i=1}^{n} (y_{i} - (\beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i} + \ldots + \beta_{k} x_{i}))^2 \]

How can we pick \(\boldsymbol{\beta}\) values that best fit our data?

The \(\boldsymbol{\beta}\) values that minimise error can be solved for analytically.
The method is to take the derivative with respect to \(\boldsymbol{\beta}\), and then find the \(\boldsymbol{\beta}\) values that make the resulting expression equal to zero.
I won’t do this here and won’t require you to do so either.
You should know however that this method of finding \(\beta\) values is called ordinary least squares.

Regression model terms

\(SS_{error} = SS_{residual} = \sum_{i=1}^{n} (y_{i} - \hat{y_{i}})^2\)
- \(SS_{error}\) is what you get when you compare raw observations against the full model predictions.
\(SS_{total} = \sum_{i=1}^{n} (y_{i} - \bar{y})^2\)
- \(SS_{total}\) is what you get when you compare raw observations against the grand mean.
\(SS_{model} = \sum_{i=1}^{n} (\bar{y} - \hat{y_i})^2\)
- \(SS_{model}\) tells you how much the added complexity of the full model reduces the overall variability (i.e., makes better predictions).

Regression model terms

The percent of variability accounted for above the simple model is given by:

\[R^2 = \frac{SS_{model}}{SS_{total}}\]

\(R^2\) is called the coefficient of determination and is just the sqaure of the correlation coefficient between \(x\) and \(y\).

Regression model terms

\[ F = \frac{MS_{model}}{MS_{error}} \]

The \(F\) ratio tells us whether or not the more complex model provides a significantly better fit to the data than the simple mean model

Regression model terms

We can also ask questions about the best fitting \(\boldsymbol{\beta}\) values (i.e., is either \(\boldsymbol{\beta}\) significantly different from zero?).
The data you use in a regression comes from random variables and so the \(\boldsymbol{\beta}\) values you estimate are also random.
It turns out the best fitting \(\boldsymbol{\beta}\) values (i.e., \(\widehat{\boldsymbol{\beta}}\)) can be tested with a \(t\)-test. The reason for this follows the same path outlined in the simple linear regression lecture.

Multiple regression in `R`

Suppose we obtain data on three variables \(X\), \(Y\), and \(Z\).

##                x           y          z
##            <num>       <num>      <num>
##   1:  1.05273247 -0.40760752  2.0871266
##   2: -2.70864901 -3.62271979 -0.6463487
##   3: -0.49530418 -0.57730447  1.1032140
##   4: -0.08350089 -0.94330637  0.4433555
##   5:  0.69698588  0.57266050  3.1568996
##   6: -0.32178710 -0.21301725  3.1454739
##   7:  0.84080308  1.17101717  2.4587368
##   8: -2.05033552  1.01056552  0.7542561
##   9: -0.74697273  0.26599286  1.8597377
##  10: -2.16092495 -0.49899361 -0.1120337
##  11: -0.65584860  0.37272739  1.9978025
##  12: -2.46746239 -0.26877588 -0.1740785
##  13:  0.08292917  1.44448947  0.5424762
##  14: -0.16689453  0.26856014  2.2535540
##  15:  0.78940278 -0.65290438  3.2447419
##  16:  0.67505153  0.01925073  2.5974684
##  17:  0.39677689  0.94159931  1.6293645
##  18: -0.19465543  0.44336670  0.9069969
##  19:  2.56120838  0.93824925  3.5366742
##  20:  0.44823717  0.56303795  2.7402871
##  21: -1.57021050  0.12019349  2.3176361
##  22: -0.19166826  0.08121941  1.2517066
##  23: -2.17561642  0.03220057  0.4211395
##  24: -0.25521740 -1.05658914  1.4506306
##  25:  1.65422126 -0.61560441  4.1057860
##  26: -0.10737248  0.02798299  0.2350197
##  27: -1.48650969 -1.82705437 -0.1773977
##  28: -1.56929884 -0.12788409  1.1855877
##  29: -0.28287439  0.57451220  2.5843140
##  30:  0.70696577 -0.06280074  1.7543377
##  31: -1.61000066 -0.52402555  2.2541519
##  32:  0.70456472  0.99520473  2.9409207
##  33:  0.33555726 -0.36524680  1.6386712
##  34: -1.56707563 -0.95711389  0.9376914
##  35: -1.62866579 -2.40635090  0.7228510
##  36: -0.55447945 -0.07258479  0.9610830
##  37: -1.80381596  0.61146075  1.3958150
##  38: -0.89289383 -0.06700927  2.2660003
##  39: -0.68990884  0.94635561  1.7612556
##  40:  0.13979807 -0.96385989  1.2798611
##  41:  0.72596768 -1.15235954  1.7354279
##  42: -1.36229099 -1.57653595  0.5751742
##  43:  0.83286250  0.16547894  2.9837376
##  44: -0.60579113 -1.31925526  3.0377722
##  45:  1.37894913  0.15284728  3.8667904
##  46:  0.11980496  0.10679001  2.2555073
##  47:  1.04306867 -2.17661587  2.3400687
##  48: -0.81872782  0.88607907  2.6955318
##  49: -0.80367688 -1.00513646  1.1662335
##  50: -0.99538978  0.22001340  1.6392459
##  51: -0.55069098 -0.53753042  1.5396113
##  52: -0.23651819 -0.17745286  2.0346273
##  53: -0.91330200  0.36043696  2.0865795
##  54: -0.38542874 -0.32369051  2.3031955
##  55: -2.24037491 -0.32172136  0.7167883
##  56: -0.87685178  1.29838294  1.5478027
##  57:  1.92800168  0.22440293  3.4147686
##  58: -0.61380869 -0.80041992  1.9076662
##  59:  1.12986157 -1.30435383  2.1203087
##  60:  1.28150829  1.34033127  2.6581425
##  61: -1.28840107  0.56944855  1.1891958
##  62:  0.59390993  1.20684601  3.0574529
##  63:  0.42215500  1.25919993  1.6245667
##  64: -2.86705800  0.21457040  0.9400840
##  65: -0.38048165 -0.78894928  1.8809075
##  66: -0.97587415 -0.44149154  1.4267237
##  67: -1.00647163 -1.84488897  1.9107992
##  68:  0.21070299 -0.80620995  0.5388187
##  69:  0.71408074 -0.09678814  3.3492961
##  70: -0.54701344  0.86762832  0.8127532
##  71: -0.40319470  0.17208189  1.3518391
##  72:  1.29951261 -0.32416763  1.8618093
##  73: -0.69171325  0.40579658  1.0763769
##  74:  0.26695747 -1.74288046  1.4182412
##  75:  0.91802879 -2.13125264  1.7927999
##  76:  0.33308511 -0.53265448  3.0173187
##  77: -0.73316443 -0.96556961  0.7666081
##  78:  0.47964846  0.17669561  2.5510993
##  79: -0.15852995  1.08156976  2.5346934
##  80: -1.43325445  0.43660780  2.1470882
##  81:  1.64618902  0.43176563  2.4083875
##  82:  0.47511098  0.20249043  3.0664197
##  83:  2.41736230 -0.39556771  3.7376505
##  84:  0.45270377  0.91232725  1.9492778
##  85:  1.45088855  0.11239167  3.4130525
##  86:  2.64075096  0.03621135  3.2401549
##  87: -1.05182257 -1.29210472  0.4407382
##  88: -0.82056236  0.42145494  3.1103803
##  89: -0.34814126 -0.10946702  2.2305003
##  90: -0.95492381 -2.70703268  0.5889455
##  91:  0.27353003  0.20807259  1.9034689
##  92: -1.28114410 -2.37808082  1.4019169
##  93: -1.04789437 -0.57000692  0.8249906
##  94:  0.85991254 -0.93615913  0.6889198
##  95:  0.57083630 -0.21056936  2.4164917
##  96:  0.77938614 -0.02301742  3.2701807
##  97:  0.29945756  0.36076017  2.5028504
##  98: -0.66456471  0.44462590  2.2594943
##  99:  0.96470682  1.09669388  2.7201712
## 100: -0.81426502 -1.00887769  0.9584292
##                x           y          z

Multiple regression in `R`

We can plot the data to see if there is a relationship between \(x\), \(y\), and \(z\).

library(data.table)
library(ggplot2)
library(scatterplot3d)

plot3d <- scatterplot3d(x,
                        y,
                        z,
                        angle=55,
                        scale.y=0.7,
                        pch=16,
                        color="red",
                        # main="Regression Plane"
)

Multiple regression in `R`

We can plot the data to see if there is a relationship between \(x\), \(y\), and \(z\).

Multiple regression in `R`

We can fit a multiple regression model in R using the lm function.

fm <- lm(z ~ x + y, data=d)
summary(fm)

Multiple regression in `R`

## 
## Call:
## lm(formula = z ~ x + y, data = d)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.84833 -0.46728  0.02928  0.47367  1.67506 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.00784    0.07357  27.292  < 2e-16 ***
## x            0.55723    0.06462   8.623 1.25e-13 ***
## y            0.23313    0.07626   3.057  0.00289 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7159 on 97 degrees of freedom
## Multiple R-squared:  0.5079, Adjusted R-squared:  0.4977 
## F-statistic: 50.05 on 2 and 97 DF,  p-value: 1.164e-15

Multiple regression in `R`

The Estimate column provides the best fitting \(\beta\) values.
The Std. Error column provides the SEM associated with the \(\beta\) values reported in the Estimate column.
The t value column provides the \(t\)-statistic for the null hypothesis that the \(\beta\) value is zero.
The Pr(>|t|) column provides the \(p\)-value associated with the \(t\)-statistic testing the null hypothesis that the true \(\beta\) value is zero.

Multiple regression in `R`

The Residual standard error provides an estimate of the standard deviation of the residuals.
The Multiple R-squared provides the \(R^2\) value.
The Adjusted R-squared provides the \(R^2\) value adjusted for the number of predictors in the model.
The F-statistic provides the \(F\)-ratio and the p-value reported next to it provides the \(p\)-value associated with the \(F\)-ratio.

Multiple regression in `R`

The best-fitting \(\beta_0\) is 2.01 and the best fitting \(\beta_1\) is 0.56.
\(\beta_0\) is the intercept and \(\beta_1\) is the slope of the regression plane in the \(x\) direction and \(\beta_2\) is the slope of the regression plane in the \(y\) direction.

Assumptions

The residuals should be normally distributed, have constant variance, and be independent.
The predictor variables should not be highly correlated with each other (i.e., no multicollinearity). This is because it is difficult to estimate the individual contributions of each predictor variable when they are highly correlated.

Multiple regression

Multiple regression

Multiple regression

Multiple regression

Multiple regression

Multiple regression

How can we pick \(\boldsymbol{\beta}\) values that best fit our data?

How can we pick \(\boldsymbol{\beta}\) values that best fit our data?

How can we pick \(\boldsymbol{\beta}\) values that best fit our data?

Regression model terms

Regression model terms

Regression model terms

Regression model terms

Multiple regression in R

Multiple regression in R

Multiple regression in R

Multiple regression in R

Multiple regression in R

Multiple regression in R

Multiple regression in R

Multiple regression in R

Assumptions

Multiple regression in `R`

Multiple regression in `R`

Multiple regression in `R`

Multiple regression in `R`

Multiple regression in `R`

Multiple regression in `R`

Multiple regression in `R`

Multiple regression in `R`