- Multiple regression works just like the simple linear regression we just covered, but with multiple predictor variables.
2024
For the \(i^{th}\) observation, we can write:
\[ \begin{align} y_{i} &= \beta_{0} + \beta_{1} x_{1_i} + \beta_{2} x_{2_i} + \dots + \beta_{k} x_{k_i} + \epsilon_{i} \\\\ \end{align} \]
\(y_{i}\) is the \(i^{th}\) observed outcome.
\(x_{k_{i}}\) is the \(i^{th}\) value of the \(k^{th}\) predictor variable.
\(\epsilon_{i}\) is called the residual and is the difference between the observed outcome and the predicted outcome.
\[ \epsilon_{i} \sim Normal(0, \sigma_{\epsilon}) \]
Now lets extend to many observations:
\[ \begin{align} y_{1} &= \beta_{0} + \beta_{1} x_{1_1} + \beta_{2} x_{2_1} + \dots + \beta_{k} x_{k_1} + \epsilon_{1} \\ y_{2} &= \beta_{0} + \beta_{1} x_{1_2} + \beta_{2} x_{2_2} + \dots + \beta_{k} x_{k_2} + \epsilon_{2} \\ &\vdots \\ y_{n} &= \beta_{0} + \beta_{1} x_{1_n} + \beta_{2} x_{2_n} + \dots + \beta_{k} x_{k_n} + \epsilon_{n} \\ \end{align} \]
We can gather the independent observations up into vectors:
\[ \begin{align} \boldsymbol{y} &= \beta_{0} \begin{bmatrix} 1\\ 1\\ \vdots\\ 1 \end{bmatrix} + \beta_{1} \begin{bmatrix} x_{1_1}\\ x_{1_2}\\ \vdots\\ x_{1_n}\\ \end{bmatrix} + \beta_{2} \begin{bmatrix} x_{2_1}\\ x_{2_2}\\ \vdots\\ x_{2_n}\\ \end{bmatrix} + \ldots + \beta_{k} \begin{bmatrix} x_{k_1}\\ x_{k_2}\\ \vdots\\ x_{k_n}\\ \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]
We can next gather the vectors up into a matrix:
\[ \begin{align} \boldsymbol{y} &= \begin{bmatrix} 1 & x_{1_1} & x_{2_1} & \ldots & x_{k_1} \\ 1 & x_{1_2} & x_{2_2} & \ldots & x_{k_2} \\ \vdots & \vdots & \dots & \dots \\ 1 & x_{1_n} & x_{2_n} & \ldots & x_{k_n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{k} \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]
We can finally write the model in compact matrix form:
\[ \begin{align} \boldsymbol{y} &= \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \\ \end{align} \]
\(\boldsymbol{y}\) is a vector of observed outcomes.
\(\boldsymbol{X}\) is a matrix of predictor variables and is called the design matrix
\(\boldsymbol{\beta}\) is a vector of \(\beta\) parameters.
\(\boldsymbol{\epsilon}\) is a vector of residuals.
let \(y_i\) denote observed values
let \(\widehat{y_{i}}\) denote predicted values:
\[ \widehat{y_{i}} = \beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i} + \ldots + \beta_{k} x_{i} \]
\[ \DeclareMathOperator*{\argmin}{\arg\!\min} \argmin_{\boldsymbol{\beta}} \sum_{i=1}^{n} (y_{i} - \widehat{y_{i}})^2 \]
\[ \DeclareMathOperator*{\argmin}{\arg\!\min} \argmin_{\boldsymbol{\beta}} \sum_{i=1}^{n} (y_{i} - (\beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i} + \ldots + \beta_{k} x_{i}))^2 \]
The \(\boldsymbol{\beta}\) values that minimise error can be solved for analytically.
The method is to take the derivative with respect to \(\boldsymbol{\beta}\), and then find the \(\boldsymbol{\beta}\) values that make the resulting expression equal to zero.
I won’t do this here and won’t require you to do so either.
You should know however that this method of finding \(\beta\) values is called ordinary least squares.
\(SS_{error} = SS_{residual} = \sum_{i=1}^{n} (y_{i} - \hat{y_{i}})^2\)
\(SS_{error}\) is what you get when you compare raw observations against the full model predictions.
\(SS_{total} = \sum_{i=1}^{n} (y_{i} - \bar{y_{i}})^2\)
\(SS_{total}\) is what you get when you compare raw observations against the grand mean.
\(SS_{error}\) comes from \(\sum_{i=1}^{n} (y_{i} - \hat{y_{i}})^2\) with \(\hat{y} = \beta_{0} + \beta_{1} x_{i} + \beta_{2} x_{i} + \ldots + \beta_{k} x_{i} + \epsilon\),
\(SS_{total}\) comes from \(\sum_{i=1}^{n} (y_{i} - \hat{y_{i}})^2\) with \(\hat{y} = \bar{y} + \epsilon\)
\(SS_{model} = \sum_{i=1}^{n} (\bar{y} - \hat{y_i})^2\) tells you how much the added complexity of the full model reduces the overall variability (i.e., makes better predictions).
\[R^2 = \frac{SS_{model}}{SS_{total}}\]
\(R^2\) is called the coefficient of determination and is just the sqaure of the correlation coefficient between \(x\) and \(y\).
The \(F\) ratio tells us whether or not the more complex model provides a significantly better fit to the data than the simplest model
\[ F = \frac{MS_{model}}{MS_{error}} \]
We can also ask questions about the best fitting \(\beta\) values (i.e., is either \(\beta\) significantly different from zero?).
The data you use in a regression comes from random variables and so the \(\beta\) values you estimate are also random.
It turns out the best fitting \(\beta\) values (i.e., \(\hat{\beta}\)) can be tested with a \(t\)-test.
R
## x y z ## <num> <num> <num> ## 1: -0.1270006060 -0.934710687 2.0486143 ## 2: -0.5870636454 1.063589007 1.1481576 ## 3: -1.1818252735 0.724968223 1.5816542 ## 4: -0.1741249442 -0.657355162 1.8778684 ## 5: -0.8180039439 -1.666401819 0.1107309 ## 6: 0.5305352204 -0.144770226 1.5961658 ## 7: 1.0383201363 0.252550838 2.7035856 ## 8: -1.4877138731 1.008351871 0.7093535 ## 9: 0.7451555105 0.469323406 1.8467448 ## 10: -0.3250450726 0.017017592 2.6619083 ## 11: 0.4742104651 -0.716119979 2.2566865 ## 12: -0.2191240330 -1.780661910 0.8750041 ## 13: -0.5945857330 -0.141113048 2.1996881 ## 14: 0.2965757615 1.321872432 2.0548366 ## 15: -0.2748796337 0.885514687 2.8331131 ## 16: 0.2166176012 0.376039733 2.5491432 ## 17: -0.0964755586 -0.279854319 0.9866483 ## 18: 0.6615163612 -1.388729827 2.9667074 ## 19: -1.3141795513 -1.339835208 1.1069781 ## 20: 1.0280068532 0.236673830 2.1086470 ## 21: 1.3521744516 0.714833075 3.3516741 ## 22: -0.9658127229 -0.008983147 2.6575836 ## 23: 1.4094250078 -0.516019454 2.9328589 ## 24: 0.0462948578 -0.651362393 2.7198915 ## 25: -1.3789626734 1.416110774 2.1860071 ## 26: -0.5925630474 -0.384247805 1.4564110 ## 27: 0.7705200592 -1.941888128 2.1529061 ## 28: 1.3932777933 -0.282364105 2.1041343 ## 29: 0.0002704068 -0.173978580 1.7032701 ## 30: 0.5658312288 2.280987227 2.6441624 ## 31: 1.5473586141 -0.170937566 2.0852717 ## 32: -1.1633893181 0.239961586 2.2083166 ## 33: 0.1567007307 1.245200253 3.2622994 ## 34: -0.9588288119 -1.705699817 0.1238667 ## 35: -1.1785038009 -0.803787851 1.3857629 ## 36: 0.2559013697 -0.957923081 1.6362327 ## 37: 0.4608409528 0.408912710 1.1429944 ## 38: -1.1723481181 1.345224931 0.9654977 ## 39: 0.6940095186 -0.424766964 1.0627294 ## 40: 0.7752643683 0.810273484 2.1220918 ## 41: 0.4327599498 -2.563958192 0.4008419 ## 42: -0.0484148433 -0.535657620 3.5398112 ## 43: -2.2067419222 0.947190416 0.4392061 ## 44: 0.0825344876 2.390919609 2.4572655 ## 45: -0.0833179373 -0.068754973 2.9264009 ## 46: 0.0373104949 0.520167591 2.0813167 ## 47: -0.0797836139 1.074655385 2.9517762 ## 48: 0.6655914689 2.256068349 2.5417768 ## 49: 0.5209649828 -0.133387632 1.8678902 ## 50: 0.2862544751 0.064305675 1.3905928 ## 51: -0.4151271715 0.435108743 1.2383324 ## 52: -0.7333221366 0.844920022 2.2189971 ## 53: -0.9376260898 -0.529578566 2.2589999 ## 54: -2.0947900276 1.874641256 0.5661773 ## 55: -0.6776633352 0.157206735 1.5164424 ## 56: -0.3162105599 -0.332278750 3.0531567 ## 57: -0.4964971841 1.853505230 2.5283272 ## 58: -1.0234945012 0.267333578 1.7323478 ## 59: -0.6695785453 -0.333011744 2.0949894 ## 60: -0.8993140502 0.255003108 0.4012755 ## 61: -1.7683996212 0.710140853 1.0237524 ## 62: 2.1342042208 1.344929915 3.6912281 ## 63: 0.9870178435 0.820062417 2.2380800 ## 64: -1.3796090580 -0.328095437 0.9079280 ## 65: 1.0498777825 -0.866045317 2.5250971 ## 66: 1.3244778760 -1.212855287 1.5154542 ## 67: -0.4072676635 -0.291607690 1.4116408 ## 68: -1.0172279316 -0.971097327 1.8208804 ## 69: 0.8687647119 -1.204071958 1.7270428 ## 70: 0.3371437922 -0.672554380 1.6450115 ## 71: 0.4319908783 0.014424685 3.0738613 ## 72: -1.2367760588 1.384372450 2.0645434 ## 73: -0.3328934056 -0.291881109 2.1274974 ## 74: -0.6628828343 0.604491232 1.2871221 ## 75: 0.5095508997 -0.402507801 1.1319215 ## 76: -0.6544678551 1.154985979 1.8160114 ## 77: -0.7752333911 -0.147638619 0.9831006 ## 78: -0.7819610739 -0.626241342 0.1707455 ## 79: 2.7268084575 0.334527649 4.2255598 ## 80: -0.8377102939 -0.195936683 1.1452770 ## 81: 0.4675923609 1.115666181 1.3640133 ## 82: 0.4724233069 -0.023654958 2.7222533 ## 83: -1.0463127257 0.137068172 1.9236423 ## 84: -0.1816832752 0.538653492 3.1039916 ## 85: 0.4315281063 1.205831628 3.4000622 ## 86: -0.3500500808 -2.399040157 0.9020665 ## 87: -0.4498263550 -0.105254427 0.2005886 ## 88: -0.7060834816 1.534204112 1.2784907 ## 89: 0.4628889947 -0.830638637 2.2592983 ## 90: 0.3164579048 -0.709403980 1.3733458 ## 91: 1.0409744934 -0.810298074 1.3038503 ## 92: -0.8282991122 0.598402311 1.8848516 ## 93: 0.4292346448 0.015580387 3.4812245 ## 94: 0.0646466573 2.329475334 2.7021788 ## 95: -0.9593979163 0.227242014 1.2074298 ## 96: -0.5172125470 0.274008498 1.6536878 ## 97: -1.3277282431 0.454203238 0.6934003 ## 98: -1.0571495267 -0.031772740 1.8316349 ## 99: 0.6960103553 1.561989379 2.7994811 ## 100: -1.0612214539 0.066561716 1.0467034 ## x y z
R
using the lm
function.fm <- lm(z ~ x + y, data=d)
summary(fm)
## ## Call: ## lm(formula = z ~ x + y, data = d) ## ## Residuals: ## Min 1Q Median 3Q Max ## -1.4422 -0.5214 -0.0799 0.4804 1.8042 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 1.91693 0.07039 27.234 < 2e-16 *** ## x 0.54163 0.07741 6.997 3.39e-10 *** ## y 0.28953 0.06964 4.158 6.94e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 0.6944 on 97 degrees of freedom ## Multiple R-squared: 0.3873, Adjusted R-squared: 0.3747 ## F-statistic: 30.66 on 2 and 97 DF, p-value: 4.798e-11
The Estimate
column provides the best fitting \(\beta\) values.
The Std. Error
column provides the SEM associated with the \(\beta\) values reported in the Estimate
column.
The t value
column provides the \(t\)-statistic for the null hypothesis that the \(\beta\) value is zero.
The Pr(>|t|)
column provides the \(p\)-value associated with the \(t\)-statistic.
The Residual standard error
provides an estimate of the standard deviation of the residuals.
The Multiple R-squared
provides the \(R^2\) value.
The Adjusted R-squared
provides the \(R^2\) value adjusted for the number of predictors in the model.
The F-statistic
provides the \(F\)-ratio and the p-value
reported next to it provides the \(p\)-value associated with the \(F\)-ratio.
The best-fitting \(\beta_0\) is 1.92 and the best fitting \(\beta_1\) is 0.54.
\(\beta_0\) is the intercept and \(\beta_1\) is the slope of the regression plane in the \(x\) direction and \(\beta_2\) is the slope of the regression plane in the \(y\) direction.