2024

Multiple regression

  • Multiple regression works just like the simple linear regression we just covered, but with multiple predictor variables.

\[ \begin{align} y_{i} &= \beta_{0} + \beta_{1} x_{1_i} + \beta_{2} x_{2_i} + \dots + \beta_{k} x_{k_i} + \epsilon_{i} \\\\ \end{align} \]

\[ \begin{align} \boldsymbol{y} &= \beta_{0} \begin{bmatrix} 1\\ 1\\ \vdots\\ 1 \end{bmatrix} + \beta_{1} \begin{bmatrix} x_{1_1}\\ x_{1_2}\\ \vdots\\ x_{1_n}\\ \end{bmatrix} + \beta_{2} \begin{bmatrix} x_{2_1}\\ x_{2_2}\\ \vdots\\ x_{2_n}\\ \end{bmatrix} + \ldots + \beta_{k} \begin{bmatrix} x_{k_1}\\ x_{k_2}\\ \vdots\\ x_{k_n}\\ \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

\[ \begin{align} \boldsymbol{y} &= \begin{bmatrix} 1 & x_{1_1} & x_{2_1} & \ldots & x_{k_1} \\ 1 & x_{1_2} & x_{2_2} & \ldots & x_{k_2} \\ \vdots & \vdots & \dots & \dots \\ 1 & x_{1_n} & x_{2_n} & \ldots & x_{k_n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{k} \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \end{align} \]

  • It’s powerful to write these equations in matrix form (the last line above) because it highlights how the two situations are essentially the same — at least mathematically.

\[ \boldsymbol{y} &= \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \]

Example

## 
## Call:
## lm(formula = y ~ x1 + x2, data = ddd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -371.16 -105.98    8.39  105.13  413.27 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -26.7827    15.1606  -1.767   0.0805 .  
## x1            4.0858     0.3919  10.426  < 2e-16 ***
## x2           -1.9990     0.3042  -6.572 2.58e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 150.7 on 96 degrees of freedom
## Multiple R-squared:  0.5314, Adjusted R-squared:  0.5217 
## F-statistic: 54.44 on 2 and 96 DF,  p-value: < 2.2e-16
## [1] 0.5314302