Get lots of hands-on practice with data.table
:)
You should pay very careful attention to learn about: *
data.table
chaining * Adding columns using :=
* Selecting rows with i
, operating on columns with
j
, and grouping with by
using
data.table
syntax [i, j, by]
# load data.table library
#install.packages("data.table") # only if you have not installed data.table before
library(data.table)
# clean session
rm(list=ls())
# Please include the following line of code as well
set.seed(0)
# create vectors from which to build d
V1 <- rep(month.abb[1:8], 2) #month.abb is a built in vector
V2 <- rep(letters[1:8], 2) #letters is a built in vector
V3 <- rnorm(length(V1))
# create a data.table named d
d <- data.table(month=V1, event=V2, obs=V3)
Reminder on data.table syntax dt[i,j,by] datatable[row, column, grouping]
A). Use the data.table
keyword .N
to report
how many rows are in d
and store the result in a variable
named ans_1a
.
B). Return a data.table
containing only the rows of
d
for which month=="Feb"
and
event=="c"
and store the result in a variable named
ans_1b
.
C). Using the %in%
operator, return a
data.table
containing only the rows of d
for
which month
is in the vector
c("Jan", "Mar", "May", "Jul")
, and store the result in a
variable named ans_1c
.
A). Return a vector
containing the month
column of d
and store the result in a variable named
ans_2a
.
B). Return a data.table
containing only the
month
and event
columns of d
and
store the result in a variable named ans_2b
.
C). Return the mean and standard deviation of the obs
column across all rows in d
and store the result in a
data.table
named ans_2c
. The resulting
data.table
should have 1
observation with
columns named obs_mean
and obs_sd
.
datatable[row, column, group] ## 3. Group by
A). Return the mean obs
grouped by event
and grouped by month
and return the result in a
data.table
named ans_3a
. Name the column of
ans_3a
that contains the mean of the obs
column obs_mean
.
A). Select the first 8 rows of d
and compute the median
value of the obs
column grouped by month
and
event
and return the result in a data.table
named ans_4a
. Name the column of ans_4a
that
contains the median of the obs
column
obs_median
.
D <- data.table(EmployeeID = 101:105,
Department = c("HR", "IT", "Finance", "Marketing", "Sales"))
A). Add a new column called Salary with values (50000, 60000, 55000, 52000, 58000)
B). Increase salary by 10%
C). Create a new column called Bonus such that: If Salary is greater than or equal to 55000, Bonus should be “High” Otherwise, Bonus should be “Low”
D). create a new column EmployeesPerDept that stores the number of employees in each Department.
products <- data.table(ProductID = 101:105,
ProductName = c("Laptop", "Phone", "Tablet", "Monitor", "Keyboard"))
sales <- data.table(ProductID = c(102, 103, 105, 106),
UnitsSold = c(50, 30, 20, 40))
A). Perform a left join to merge products with sales by ProductID, ensuring all rows from products are retained. Store the result in ans_6a.
B). Perform a right join to merge products with sales, ensuring all rows from sales are retained. Store the result in ans_6b.
D <- data.table(Student = c("A", "B", "C"), Math = c(85, 78, 92), Science = c(90, 82, 95), English = c(88, 80, 94))
A). Convert d from wide format to long format, where: Student remains as the identifier column (id.vars). The column names Math, Science, and English should be stored in a column called “Subject”. The values should be stored in a column named “Score”. Store the result in ans_7a.
DT <- data.table(
ID = 1:5,
Name = c("Alice", "Bob", "Charlie", "David", "Eva"),
Score = c(85, 92, 88, 94, 90)
)
A). Sort DT by Score in descending order. Select the top 3 rows. Compute the maximum score in these rows and store it in a new column/datatable “TopScore”. Store the result in ans_8a.
B). Filter DT to include only rows where Score >= 90. Select only the columns Name and Score. Sort the result in descending order of Score. Store the result in ans_8b
C). Compute the mean Score of all employees, individually. Create a new column AdjustedScore where each score is increased by 5% of the mean score. Store the result in ans_8c.
D). Group DT by Name. Compute the average score for each person. Sort the result in descending order of AverageScore. Store the result in ans_8d