library(data.table)
library(ggplot2)
#Load in built-in data as a data.table
iris <- as.data.table(iris)
A). Create a histogram on petal length only for the versicolor species. Adjust binwidth and add labels accordingly.
ggplot(data = iris[Species == "versicolor"], aes(x = Petal.Length)) +
geom_histogram(binwidth = 0.05) +
labs(title = "Distribution of Petal Length for Versicolor ",
x = "Petal Length",
y = "Frequency")
B). Create a boxplot graph on sepal length only for the setosa and virginica species. Add labels accordingly.
ggplot(data = iris[Species %in% c("setosa", "virginica")], aes(x = Species, y = Sepal.Length)) +
geom_boxplot() +
labs(title = "Distribution of Petal Length for Virginica and Setosa",
x = "Species",
y = "Petal Length")
Complete the experiment located at the following address:
https://run.pavlovia.org/demos/simplertt/
Be sure to enter your name or some other ID that you will remember and can be easily searched for.
Download your data in a .csv
file by taking the
following steps:
Go here:
https://gitlab.pavlovia.org/demos/simplertt
Read about the experiment you just participated in by scrolling
to the bottom and reading the README.md
file.
Click the data
folder to land at the following
address:
https://gitlab.pavlovia.org/demos/simplertt/tree/master/data
Near the top right of the page, click the Find file
button, and search for a file containing the unique ID that you entered
at the beginning of the experiment.
Click on the .csv
file that pops up.
Finally, click the download
button to download your
.csv
file to your local machine.
data.table
data.table
library and rm
to be
sure your are starting with a clean work space.library(data.table)
library(ggplot2)
rm(list = ls())
Q1. Load the data into a data.table
using the
fread
function from the data.table
library.
# You need to replace the path I use here with a path that
# points to wherever you have your data stored.
d <- fread('')
When data.table
objects have lots of columns,
str
can be a good summary function to use for basic
inspection
str(d)
It’s certainly difficult to know what all of these columns encode. This is something you will get used to as you build and run your own experiments (e.g., as you will in later COGS units). For now, I’ll just tell you. The data contains a row for every trial completed, including practice trials.
Q2a. Extract the column named ’practiceTrials.thisN and mainTrials.thisN
d[, .(practiceTrials.thisN, mainTrials.thisN)]
Q2b. What are the NA’s telling us?
Ans = We can see that mainTrials.thisN
is
NA
during practice and practiceTrials.thisN
is
NA
during the main experiment.
Q2c. What’s the column name that stores RT?
Ans = We can also see that our reaction time per trial is stored in a
column named response.rt
Q2d. What column is our independent variable? What’s it telling us
Ans = isi
is an independent variable.
Okay, we are now equipped to pull out just the rows and columns that we need for a simple exploration of our performance.
# We begin by just looking and the main trials
d_main <- d[!is.na(mainTrials.thisN), .(response.rt, isi)]
Q3a. Explain in words what the above code is doing
Q3b. Create a simple histogram plotting response.rt. What do you notice?
ggplot(d_main, aes(x=response.rt)) +
geom_histogram(bins=30)
Q3c. How can we compute and report basic descriptive statistics (mean, median, and standard deviation) for response times in the main experiment using data.table? Specifically, how do we apply functions like mean(), median(), and sd() to the response.rt column within d_main to summarize the data efficiently?
# Finally, we report basic descriptive statistics as actual
# numbers for the main experiment
d_main[, .(rt_mean=mean(response.rt),
rt_median=median(response.rt),
rt_sd=sd(response.rt))]