2025

Question 1

Write R code to generate data for a simple linear regression model in which the true slope is 0.5 and the true intercept is 2.

Click here for the answer
library(data.table)
library(ggplot2)

x <- rnorm(100, 0, 1)
y <- 2 + 0.5 * x + rnorm(100, 0, 1)
d <- data.table(x, y)
fm <- lm(y ~ x, data = d)
ggplot(d, aes(x = x, y = y)) +
  geom_point() +
  geom_smooth(method = "lm", se = T) +
  labs(title = "Simple Linear Regression Model",
       x = "x",
       y = "y") +
  theme_minimal()

Question 2

What is a good guess for \(\beta_0\) and \(\beta_1\) in the following regression model?

Click here for the answer

The idea is to simply read the values from the plot. The intercept is the value of \(y\) when \(x = 0\), and the slope is the change in \(y\) for a one unit increase in \(x\).

In this case, the true intercept is 100 and the true slope is 10.

Question 3

Consider the following samples from \(X\) and \(Y\):

x <- c(1.58, 1.23, 0.78, 0.45, 0.12, 0.98, 1.34, 1.56, 1.23, 0.89) 
y <- c(0.45, 0.78, 1.23, 1.56, 1.89, 1.34, 1.12, 0.89, 0.56, 0.23)

What is the percentage of variance explained in \(y\) by \(x\) given a simple linear regression model?

Click here for the answer
x <- c(1.58, 1.23, 0.78, 0.45, 0.12, 0.98, 1.34, 1.56, 1.23, 0.89) 
y <- c(0.45, 0.78, 1.23, 1.56, 1.89, 1.34, 1.12, 0.89, 0.56, 0.23)
d <- data.table(x, y)
fm <- lm(y ~ x, data = d)
r2 <- summary(fm)$r.squared # extract percentage of variance explained
print(r2)

Question 4

Suppose that Pearson’s correlation coefficient between \(X\) and \(Y\) is 0.8. What is the percentage of variance explained in \(Y\) by \(X\)?

Click here for the answer

The percentage of variance explained is equal to the square of the correlation coefficient. Therefore, the percentage of variance explained is \(0.8^2 = 0.64\) or 64%.

Question 5

Suppose that the coefficient of determination \(R^2\) is 0.64. What is the correlation coefficient between \(X\) and \(Y\)?

Click here for the answer

The correlation coefficient is equal to the square root of the coefficient of determination. Therefore, the correlation coefficient is \(\sqrt{0.64} = 0.8\).

Question 6

Create three different sets of x and y data with increasing coefficient of determination \(R^2\) values.

Click here for the answer
x1 <- rnorm(100, 0, 1)
y1 <- 2 + 0.5 * x1 + rnorm(100, 0, 1)
d1 <- data.table(x1, y1)
fm1 <- lm(y1 ~ x1, data = d1)

x2 <- rnorm(100, 0, 1)
y2 <- 2 + 0.8 * x2 + rnorm(100, 0, 0.5)
d2 <- data.table(x2, y2)
fm2 <- lm(y2 ~ x2, data = d2)

x3 <- rnorm(100, 0, 1)
y3 <- 2 + 0.9 * x3 + rnorm(100, 0, 0.1)
d3 <- data.table(x3, y3)
fm3 <- lm(y3 ~ x3, data = d3)

print(summary(fm1)$r.squared)
print(summary(fm2)$r.squared)
print(summary(fm3)$r.squared)