Work through these practice exercises

  • It’s a good idea to work through these on your own, but if you get very stuck, solutions can be found here

1.

Suppose that researchers for a popular social media platform are interested maximising the amount of time that users spend on their app, and that they are faced with a tough decision about whether or not the app should use a continuous scroll feature. Since opinions on the research team are split on the issue, they decide to perform an experiment to clear things up. They implement the continuous scroll feature and push it to their users. The average time that a user spends on app currently is 58 minutes per day. Thus they are interested in whether or not continuous scroll increases this number. Suppose they obtain the following sample from this experiment:

\[ x = \{ 63.97, 60.29, 85.60, 72.57, 54.65, 53.74, 69.13, 70.07, 67.19, 53.95 \} \]

Test the hypothesis that infinite scroll had any effect whatsoever (increased or decreased) time on app beyond 58 minutes per day assuming that the population variance of \(X\) is \(\sigma_X^2 = 10\). Do not use t.test, binim.test or any other built in full test function unless explicitly instructed to do so.

  • Store the value of your observed test statistic in a variable named ans_1_test_stat_obs.

  • Store the lower critical value of your test in a variable named ans_1_critical_value_lower.

  • Store the upper critical value of your test in a variable named ans_1_critical_value_upper.

  • Store the lower bound of a 95% CI for your test statistic in a variable named ans_1_CI_lower.

  • Store the upper bound of a 95% CI for your test statistic in a variable named ans_1_CI_upper.

  • Store the p-value of your test in a variable named ans_1_p_value.

2.

Test the hypothesis that \(\theta\) is different than 0.5, where \(\theta\) is a parameter of some random variable. The sampling distribution for \(\hat{\theta}\) is shown in the following figure and the observed test statistic \(\hat{\theta}_{obs}\) is shown in blue. Ensure that the type I error rate of your test is \(\alpha=0.05\).

## Warning in geom_segment(aes(x = a, xend = a, y = 0, yend = fx[1])): All aesthetics have length 1, but the data has 101 rows.
## ℹ Did you mean to use `annotate()`?
## Warning in geom_segment(aes(x = b, xend = b, y = 0, yend = fx[length(fx)])): All aesthetics have length 1, but the data has 101 rows.
## ℹ Did you mean to use `annotate()`?
## Warning in geom_segment(aes(x = xobs, xend = xobs, y = 0, yend = fx[x == : All aesthetics have length 1, but the data has 101 rows.
## ℹ Did you mean to use `annotate()`?

  • Store the lower critical value of your test in a variable named ans_2_critical_value_lower.

  • Store the upper critical value of your test in a variable named ans_2_critical_value_upper.

  • Store the lower bound of a 95% CI for \(\hat{\theta}\) in a variable named ans_2_CI_lower.

  • Store the upper bound of a 95% CI for \(\hat{\theta}\) in a variable named ans_2_CI_upper.

  • Store the p-value of this test in a variable named ans_2_p_value.

3.

Please respond by assigning "confidence", "alpha", "beta", or "power" to the variables requested below. Note that the dashed lines in the above figure are the critical values.

  • What quantity does region I in the above plot contribute to? Store your answer in a variable named ans_3a.

  • What quantity does region II in the above plot contribute to? Store your answer in a variable named ans_3b.

  • What quantity does region III in the above plot contribute to? Store your answer in a variable named ans_3c.

  • What quantity does region IV in the above plot contribute to? Store your answer in a variable named ans_3d.

  • What quantity does region V in the above plot contribute to? Store your answer in a variable named ans_3e.

  • What quantity does region VI in the above plot contribute to? Store your answer in a variable named ans_3f.

Preamble

  • These exercises require you to use the stringr package, so please install it using whatever method you like.

  • After you get stringr installed, load it using library(stringr).

These exercises rely on magnetoencephalography (MEG) data collected from a single participant while they performed a category learning experiment. On each trial of the category learning experiment, the participant viewed a circular sine wave grating, and had to push a button to indicate whether they believed the stimulus belonged to category A or category B. We have seen and worked with this type of category learning experiment many times throughout this course, and it is further described by the following figure.

FigName
FigName

MEG is used to record the time-series of magnetic and electric potentials at the scalp, which are generated by the activity of neurons. There are many sensors, each configured to pick up signal from a different position on the scalp. This is shown in the following figure (the text labels indicate the channel name and are placed approximately where the MEG sensor is located on a real head).

FigName
FigName

The data file that we will be working with is arranged into epochs aligned to stimulus presentation. This means that every time a stimulus is presented we say that an epoch has occurred. We then assign a time of \(t=0\) to the exact moment the stimulus appeared. We then typically look at the neural time series from just before the stimulus appeared to a little while after the stimulus has appeared. For this data, each epoch starts 0.1 seconds before stimulus onset, and concludes 0.3 seconds after stimulus onset. The following figure shows the MEG signal at every sensory location across the entire scalp for 5 time points within this \([-0.1s, 0.3s]\) interval.

FigName
FigName
  • The data can be read into a data.table and the columns renamed to eliinate spaces by using the following code:
library(stringr)
d <- fread('https://crossley.github.io/book_stats/data/eeg/epochs.txt')

# The column names that come from this file have spaces
# This line removes those spaces (depends on the `stringr` package)
names(d) <- str_replace_all(names(d), c(" " = "." , "," = "" ))
  • The time column contains times in seconds relative to stimulus onset. Stimulus onset always occurs at \(0\) seconds.

  • The condition column indicates which category the stimulus belonged to for the given epoch. We won’t make use of this column here, and we will remove it below.

  • The epoch column is the epoch number. You can think of this like we have usually thought of trial columns in examples throughout the course.

  • The many different MEG xyz columns contain the actual neural time series signals for each sensor. See the above figure for how these column names map onto scalp positions.

  • The time column contains times in seconds relative to stimulus onset. Stimulus onset always occurs at \(0\) seconds.

  • The condition column indicates which category the stimulus belonged to for the given epoch. We won’t make use of this column here, and we will remove it below.

  • The epoch column is the epoch number. You can think of this like we have usually thought of trial columns in examples throughout the course.

  • The many different MEG xyz columns contain the actual neural time series signals for each sensor. See the above figure for how these column names map onto scalp positions.

1

Consider two random variables \(X \sim \mathcal{N}(\mu_X, \sigma_X)\) and \(Y \sim \mathcal{N}(\mu_Y, \sigma_Y)\). Let \(X\) generate data for MEG channel 133 and \(Y\) generate data for MEG channel 135. Test the hypothesis that the mean MEG signal for \(t > 0\) in these two channels are significantly different. When computing the mean MEG signal, keep epochs separate and average over everything else. You should be left with one observation per epoch. Assume that \(\sigma_X = \sigma_Y\) and also assume that \(X\) and \(Y\) are independent.

  • Store the observed \(t\) value of this test in a variable named ans_1_t_test_stat_obs.

  • Store the lower critical value in a variable named ans_1_critical_value_lower.

  • Store the upper critical value in a variable namedans_1_critical_value_upper.

  • Store the observed \(95\%\) CI lower value in a variable namedans_1_CI_lower

  • Store the observed \(95\%\) CI upper value in a variable namedans_1_CI_upper

  • Store the observed \(p\)-value in a variable namedans_1_p_value

2

Consider two random variables \(X \sim \mathcal{N}(\mu_X, \sigma_X)\) and \(Y \sim \mathcal{N}(\mu_Y, \sigma_Y)\). Let \(X\) generate data for MEG channel 039 during the first 30 epochs and \(Y\) generate data for MEG channel 039 during the remaining epochs. Test the hypothesis that the mean MEG signal for \(t > 0\) in these two signals are significantly different. Assume \(X\) and \(Y\) are independent but do not assume that \(\sigma_X=\sigma_Y\).

  • Store the observed \(t\) value of this test in a variable named ans_2_t_test_stat_obs.

  • Store the lower critical value in a variable named ans_2_critical_value_lower.

  • Store the upper critical value in a variable namedans_2_critical_value_upper.

  • Store the observed \(95\%\) CI lower value in a variable namedans_2_CI_lower

  • Store the observed \(95\%\) CI upper value in a variable namedans_2_CI_upper

  • Store the observed \(p\)-value in a variable namedans_2_p_value

3

Do you think two different MEG channels on the same persons head are likely to be independent? Explain your reasoning in a brief comment (no more than a sentence or two).

4

Consider two random variables \(X \sim \mathcal{N}(\mu_X, \sigma_X)\) and \(Y \sim \mathcal{N}(\mu_Y, \sigma_Y)\). Let \(X\) generate data for MEG channel 039 and \(Y\) generate data for MEG channel 135. Test the hypothesis that the mean MEG signal per epoch for \(t > 0\) in these two channels are significantly different. Do not assume \(X\) and \(Y\) are independent.

  • Store the observed \(t\) value of this test in a variable named ans_4_t_test_stat_obs.

  • Store the lower critical value in a variable named ans_4_critical_value_lower.

  • Store the upper critical value in a variable namedans_4_critical_value_upper.

  • Store the observed \(95\%\) CI lower value in a variable namedans_4_CI_lower

  • Store the observed \(95\%\) CI upper value in a variable namedans_4_CI_upper

  • Store the observed \(p\)-value in a variable namedans_4_p_value