2025

Question

A researcher reports a one-way ANOVA with \(SS_{\text{between}} = 45\) and \(SS_{\text{total}} = 120\). Calculate \(\eta^2\) for this result. What does this value mean in terms of effect size interpretation?

Click here for the answer

\(\eta^2 = \frac{SS_{\text{between}}}{SS_{\text{total}}} = \frac{45}{120} = 0.375\). This means that 37.5% of the variance in the outcome variable is explained by group membership.

Question

Imagine two groups of students have means of 75 and 85 on a test, with a pooled standard deviation of 5. Compute Cohen’s \(d\) for this independent samples \(t\)-test, and explain how it would inform you about the size of the group difference.

Click here for the answer

Cohen’s \(d = \frac{\bar{X}_1 - \bar{X}_2}{s_p} = \frac{75 - 85}{5} = -2.0\). The negative sign indicates direction, but in terms of effect size, we focus on the magnitude. A \(d\) of 2.0 is considered a very large effect size, meaning the groups differ by two standard deviations—a substantial difference.

Question

Which of the following best explains why adjusted \(R^2\) might decrease when adding an uninformative predictor to a regression model?

  1. Adjusted \(R^2\) always increases when adding predictors, even uninformative ones.

  2. Adjusted \(R^2\) decreases because adding uninformative predictors increases model complexity without improving explanatory power.

  3. Adjusted \(R^2\) measures only model fit, so it is unaffected by the number of predictors.

Click here for the answer
  1. Adjusted \(R^2\) decreases because adding uninformative predictors increases model complexity without improving explanatory power. Adjusted \(R^2\) penalizes model complexity, so it can decrease even if the ordinary \(R^2\) increases.

Question

Suppose you fit several simple linear regressions predicting an outcome variable Y with each of 5 predictors individually. Each of these models yields an \(R^2\) around 0.2. Now you fit a model with 20 predictors and get an \(R^2\) of 0.95 and an adjusted \(R^2\) of 0.2. Which statement below best describes what is likely happening?

  1. The model is correctly capturing all relevant predictors and performing well.

  2. The model is likely overfitting, capturing noise rather than true signal.

  3. The adjusted \(R^2\) suggests the model is underfitting.

Click here for the answer
  1. The model is likely overfitting, capturing noise rather than true signal. Although the \(R^2\) is high, adjusted \(R^2\) applies a penalty for adding too many predictors relative to the sample size. A large drop from \(R^2\) to adjusted \(R^2\) suggests that many predictors are not actually improving model fit beyond what would be expected by chance. This indicates that the model is too complex for the available data and is fitting idiosyncrasies in the training data rather than meaningful patterns.

Question

Imagine a study where each subject is measured multiple times during a learning experiment. You want to test the effect of time on performance. Which of the following analyses is most appropriate to account for correlated residuals?

  1. A simple linear model using lm().

  2. A GLS model using gls() with a compound symmetry correlation structure.

  3. An independent samples t-test.

Click here for the answer
  1. A GLS model using gls() with a compound symmetry correlation structure. This approach correctly accounts for the correlated residuals often present in repeated measures designs.

Question

You have a dataset where each subject is measured on one outcome variable at only one time point (no repeated measures). You want to test whether X predicts Y. You fit two models:

  • Model A: lm(Y ~ X, data = your_data)

  • Model B: gls(Y ~ X, correlation = corCompSymm(form = ~ 1 | subject), data = your_data)

Which model is more appropriate for this scenario?

Click here for the answer

Model A: lm(Y ~ X, data = your_data) is more appropriate because there is no repeated measures structure—each subject contributes only one observation. The gls() model unnecessarily specifies a correlation structure when none is needed.

Question

Which of the following statements best describes the general linear model representation of one-way ANOVA?

  1. One-way ANOVA uses a separate linear regression for each group.

  2. One-way ANOVA is equivalent to fitting a linear model where the predictor variable is a set of dummy variables for group membership.

  3. One-way ANOVA cannot be represented as a linear model.

Click here for the answer
  1. One-way ANOVA is equivalent to fitting a linear model where the predictor variable is a set of dummy variables for group membership. The model compares the fit of group-specific means to the fit of a single grand mean.

Question

In the linear model framework, what do the numerator and denominator of the F-ratio represent?

  1. Numerator: The residual variance left after fitting the full model; Denominator: The variance explained by adding the predictor(s) to the null model.

  2. Numerator: The additional variance explained by adding the predictor(s) to the null model (i.e., the improvement in fit); Denominator: The residual variance left after fitting the full model.

  3. Numerator: The total variance in the outcome variable; Denominator: The variance explained by the grand mean.

Click here for the answer

Answer: b) Numerator: The additional variance explained by adding the predictor(s) to the null model (i.e., the improvement in model fit). Denominator: The residual variance left after fitting the full model — that is, the variance not explained by the predictors included in the full model.