29 Multiple Regression
- Multiple regression works just like the simple linear regression we just covered, but with multiple predictor variables.
\[\begin{align} y_{i} &= \beta_{0} + \beta_{1} x_{1_i} + \beta_{2} x_{2_i} + \dots + \beta_{k} x_{k_i} + \epsilon_{i} \\\\ \boldsymbol{y} &= \beta_{0} \begin{bmatrix} 1\\ 1\\ \vdots\\ 1 \end{bmatrix} + \beta_{1} \begin{bmatrix} x_{1_1}\\ x_{1_2}\\ \vdots\\ x_{1_n}\\ \end{bmatrix} + \beta_{2} \begin{bmatrix} x_{2_1}\\ x_{2_2}\\ \vdots\\ x_{2_n}\\ \end{bmatrix} + \ldots + \beta_{k} \begin{bmatrix} x_{k_1}\\ x_{k_2}\\ \vdots\\ x_{k_n}\\ \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \boldsymbol{y} &= \begin{bmatrix} 1 & x_{1_1} & x_{2_1} & \ldots & x_{k_1} \\ 1 & x_{1_2} & x_{2_2} & \ldots & x_{k_2} \\ \vdots & \vdots & \dots & \dots \\ 1 & x_{1_n} & x_{2_n} & \ldots & x_{k_n} \end{bmatrix} \begin{bmatrix} \beta_{0} \\ \beta_{1} \\ \vdots \\ \beta_{k} \end{bmatrix} + \begin{bmatrix} \epsilon_1\\ \epsilon_2\\ \vdots\\ \epsilon_n\\ \end{bmatrix} \\\\ \boldsymbol{y} &= \boldsymbol{X} \boldsymbol{\beta} + \boldsymbol{\epsilon} \end{align}\]
- It’s powerful to write these equations in matrix form (the last line above) because it highlights how the two situations are essentially the same.. at least mathematically.
29.0.0.1 Example:
library(data.table)
library(ggplot2)
## install.packages('scatterplot3d') ## If you need to
library(scatterplot3d)
rm(list=ls())
d <- fread('https://crossley.github.io/book_stats/data/eeg/epochs.txt')
d[, V1 := NULL]
## convert from wide to long format
dd <- melt(d, id.vars=c('time', 'condition', 'epoch'))
## pick out some columns randomly
y <- dd[variable=='MEG 001', mean(value), .(epoch)][, V1]
x1 <- dd[variable=='MEG 010', mean(value), .(epoch)][, V1]
x2 <- dd[variable=='MEG 015', mean(value), .(epoch)][, V1]
ddd <- data.table(y, x1, x2)
## visualise possible linear relationship
plot3d <- scatterplot3d(x1,
x2,
y,
angle=55,
scale.y=0.7,
pch=16,
color="red",
main="Regression Plane")
## fit a simple linear regression model
fm <- lm(y ~ x1 + x2, data=ddd)
plot3d$plane3d(fm, lty.box = "solid")
##
## Call:
## lm(formula = y ~ x1 + x2, data = ddd)
##
## Residuals:
## Min 1Q Median 3Q Max
## -371.16 -105.98 8.39 105.13 413.27
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -26.7827 15.1606 -1.767 0.0805 .
## x1 4.0858 0.3919 10.426 < 2e-16 ***
## x2 -1.9990 0.3042 -6.572 2.58e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 150.7 on 96 degrees of freedom
## Multiple R-squared: 0.5314, Adjusted R-squared: 0.5217
## F-statistic: 54.44 on 2 and 96 DF, p-value: < 2.2e-16
## [1] 0.5314302