A researcher reports a one-way ANOVA with \(SS_{\text{between}} = 45\) and \(SS_{\text{total}} = 120\). Calculate \(\eta^2\) for this result. What does this value mean in terms of effect size interpretation?
2025
A researcher reports a one-way ANOVA with \(SS_{\text{between}} = 45\) and \(SS_{\text{total}} = 120\). Calculate \(\eta^2\) for this result. What does this value mean in terms of effect size interpretation?
\(\eta^2 = \frac{SS_{\text{between}}}{SS_{\text{total}}} = \frac{45}{120} = 0.375\). This means that 37.5% of the variance in the outcome variable is explained by group membership.
Imagine two groups of students have means of 75 and 85 on a test, with a pooled standard deviation of 5. Compute Cohen’s \(d\) for this independent samples \(t\)-test, and explain how it would inform you about the size of the group difference.
Cohen’s \(d = \frac{\bar{X}_1 - \bar{X}_2}{s_p} = \frac{75 - 85}{5} = -2.0\). The negative sign indicates direction, but in terms of effect size, we focus on the magnitude. A \(d\) of 2.0 is considered a very large effect size, meaning the groups differ by two standard deviations—a substantial difference.
Which of the following best explains why adjusted \(R^2\) might decrease when adding an uninformative predictor to a regression model?
Adjusted \(R^2\) always increases when adding predictors, even uninformative ones.
Adjusted \(R^2\) decreases because adding uninformative predictors increases model complexity without improving explanatory power.
Adjusted \(R^2\) measures only model fit, so it is unaffected by the number of predictors.
Suppose you fit several simple linear regressions predicting an outcome variable Y with each of 5 predictors individually. Each of these models yields an \(R^2\) around 0.2. Now you fit a model with 20 predictors and get an \(R^2\) of 0.95 and an adjusted \(R^2\) of 0.2. Which statement below best describes what is likely happening?
The model is correctly capturing all relevant predictors and performing well.
The model is likely overfitting, capturing noise rather than true signal.
The adjusted \(R^2\) suggests the model is underfitting.
Imagine a study where each subject is measured multiple times during a learning experiment. You want to test the effect of time on performance. Which of the following analyses is most appropriate to account for correlated residuals?
A simple linear model using lm().
A GLS model using gls() with a compound symmetry correlation structure.
An independent samples t-test.
You have a dataset where each subject is measured on one outcome variable at only one time point (no repeated measures). You want to test whether X predicts Y. You fit two models:
Model A: lm(Y ~ X, data = your_data)
Model B: gls(Y ~ X, correlation = corCompSymm(form = ~ 1 | subject), data = your_data)
Which model is more appropriate for this scenario?
Model A: lm(Y ~ X, data = your_data)
is more appropriate because there is no repeated measures structure—each subject contributes only one observation. The gls() model unnecessarily specifies a correlation structure when none is needed.
Which of the following statements best describes the general linear model representation of one-way ANOVA?
One-way ANOVA uses a separate linear regression for each group.
One-way ANOVA is equivalent to fitting a linear model where the predictor variable is a set of dummy variables for group membership.
One-way ANOVA cannot be represented as a linear model.
In the linear model framework, what do the numerator and denominator of the F-ratio represent?
Numerator: The residual variance left after fitting the full model; Denominator: The variance explained by adding the predictor(s) to the null model.
Numerator: The additional variance explained by adding the predictor(s) to the null model (i.e., the improvement in fit); Denominator: The residual variance left after fitting the full model.
Numerator: The total variance in the outcome variable; Denominator: The variance explained by the grand mean.
Answer: b) Numerator: The additional variance explained by adding the predictor(s) to the null model (i.e., the improvement in model fit). Denominator: The residual variance left after fitting the full model — that is, the variance not explained by the predictors included in the full model.