This tutorial is designed to get everyone operational in R and to build comfort with core R objects and control flow.
If you already have both installed, you can skip this section.
A common confusion is thinking R and RStudio are the same thing. They are not.
R is the programming language (and the software that runs R code).
If you install R, you can run R code without RStudio.
RStudio is an editor + interface (an IDE which stands for Interactive Development Environment).
It helps you write and organise R code, but it is not the language itself.
This unit teaches R. We use RStudio simply because many people find they enjoy using it. But you can use any editor or interface you like to write and run R code.
You can run R in at least two common “no RStudio” ways.
Open a terminal (Mac: Terminal, Windows: PowerShell or Terminal, Linux: any terminal).
Type R
You will enter the R console where you can type things like (press enter after each line to execute):
1 + 1
x <- 10
x * 2
q()
test_script.R
containing:x <- 10
y <- 5
z <- x + y
print(z)
test_script.R and run
Rscript test_script.RBy default, RStudio has a 4-pane layout.
.R
files)The idea is to write code in the script, run it in the console, and see results in the environment and plots.
To run a line of code in the script, place your cursor on the line
and press Ctrl + Enter (Windows) or
Cmd + Enter (Mac). You can run multiple lines by selecting
them and pressing the same shortcut.
You can restart your session and start from clean slate (clear all
objects in the environment) by navigating to the session
menu and clicking Restart R (or using the keyboard
shortcut).
# A numeric variable
x <- 10
y <- 5
# Basic arithmetic
z <- x + y
z
## [1] 15
# A character variable (text)
s1 <- "10"
s2 <- "5"
# Convert text to numeric
as.numeric(s1) + as.numeric(s2)
## [1] 15
# A logical variable (TRUE/FALSE)
x == y
## [1] FALSE
x > y
## [1] TRUE
x < y
## [1] FALSE
Inspect types
class(x)
## [1] "numeric"
class(s1)
## [1] "character"
class(x == y)
## [1] "logical"
A vector is a 1D container where elements are usually the same type.
v_num <- c(2, 3, 5, 7, 11)
v_num
## [1] 2 3 5 7 11
v_chr <- c("a", "b", "c")
v_chr
## [1] "a" "b" "c"
v_log <- c(TRUE, FALSE, TRUE)
v_log
## [1] TRUE FALSE TRUE
A list can hold mixed types.
L <- list(1, 2, "a", "b", TRUE)
L
## [[1]]
## [1] 1
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] "a"
##
## [[4]]
## [1] "b"
##
## [[5]]
## [1] TRUE
A data.frame is a table-like object where columns can have different types.
v1 <- c('I', 'I', 'I', 'I', 'II', 'II', 'II', 'II')
v2 <- c('a', 'a', 'b', 'b', 'c', 'c', 'd', 'd')
v3 <- c(-1.6297880,
-1.0738506,
0.0299236,
-1.5435811,
-0.5133278,
-1.4716107,
-1.1986316,
-1.5548207)
d <- data.frame(v1=v1, v2=v2, v3=v3)
d
## v1 v2 v3
## 1 I a -1.6297880
## 2 I a -1.0738506
## 3 I b 0.0299236
## 4 I b -1.5435811
## 5 II c -0.5133278
## 6 II c -1.4716107
## 7 II d -1.1986316
## 8 II d -1.5548207
tmp <- c(0.24, 0.015, 1.34, -1.00, -0.15)
tmp[1] # first element
## [1] 0.24
tmp[3] # third element
## [1] 1.34
tmp[c(1,3,5)] # multiple elements
## [1] 0.24 1.34 -0.15
Lists have two common indexing styles:
[[ ]] pulls out the element itself[ ] keeps a listtmp <- list(0.5, "W", FALSE)
tmp[2] # returns a list of length 1
## [[1]]
## [1] "W"
tmp[[2]] # returns the element itself
## [1] "W"
class(tmp[2])
## [1] "list"
class(tmp[[2]])
## [1] "character"
tmp <- data.frame(v1=c(1, 2, 3),
v2=c('A', 'B', 'C'),
v3=c(TRUE, TRUE, FALSE))
tmp$v2
## [1] "A" "B" "C"
tmp[, "v2"]
## [1] "A" "B" "C"
tmp[ , 2]
## [1] "A" "B" "C"
R will sometimes coerce types (quietly).
element_1 <- "A"
element_2 <- 2
element_3 <- FALSE
L <- list(element_1, element_2, element_3)
V <- c(element_1, element_2, element_3)
L
## [[1]]
## [1] "A"
##
## [[2]]
## [1] 2
##
## [[3]]
## [1] FALSE
V
## [1] "A" "2" "FALSE"
class(L[[2]])
## [1] "numeric"
class(V[2])
## [1] "character"
Key idea: - Lists preserve types per element - Vectors coerce to a single type when needed
x <- 5
for(i in c(1, 3, 5)) {
x <- i + 2
}
x
## [1] 7
i
## [1] 5
x <- 10
y <- c(1, 2, 3)
while(x > 5) {
y <- c(x, y)
x <- x - 1
}
y
## [1] 6 7 8 9 10 1 2 3
length(y)
## [1] 8
x <- 10
y <- -10
if(x > y) {
z <- 10
} else {
z <- 2
}
z
## [1] 10
for(i in 1:10) {
x <- i * 2
if(x > 5) {
break
}
}
i
## [1] 3
x
## [1] 6
Functions are a way to encapsulate code and make it reusable. They can help avoid repetition and improve readability. The idea is to define a function once, and then call it whenever you need to perform that task.
f <- function(x) {
y <- x^2
return(y)
}
f(3)
## [1] 9
Functions can call other functions.
g <- function(x) {
y <- x - 2
return(y)
}
f(g(4))
## [1] 4
Named arguments are allowed, but keep them readable.
h <- function(x, y, z) {
res <- x + y - z
return(res)
}
h(2, 1, -4)
## [1] 7
h(x=2, y=1, z=-4)
## [1] 7
It’s a good idea to work through these on your own, but if you get very stuck, solutions can be found here
First, it’s a good idea to restart your r session:
ans_1a_x and set its value to
10, a variable named ans_1a_y and set its
value to 5, and a variable named ans_1a_z and
set its value to the sum of ans_1a_x and
ans_1a_y. The line of code that you write to define
ans_1a_z must include the + operator.ans_1b_x and set its value to
"10", a variable named ans_1b_y and set its
value to "5", and a variable named ans_1b_z
and set its value to the sum of the numeric values of
ans_1b_x and ans_1b_y. The line of code that
you write to define ans_1b_z must include the
+ operator and the as.numeric() function.ans_1c and set its value to the
logical result indicating if ans_1a_x is equal
to ans_1a_y. The line of code that you write to define
ans_1c must include the == operator.vector named ans_2a that contains
the elements 2, 3, 5,
7, and 11.list named ans_2b that contains
the elements 1, 2, "a",
"b" and TRUE.data.frame named ans_2c that
contains v1, v2, and v3 as
defined below as columns. Make sure the column names of the
data.frame you create are equal to the variable names
v1, v2, and v3.v1 <- c('I', 'I', 'I', 'I', 'II', 'II', 'II', 'II')
v2 <- c('a', 'a', 'b', 'b', 'c', 'c', 'd', 'd')
v3 <- c(-1.6297880,
-1.0738506,
0.0299236,
-1.5435811,
-0.5133278,
-1.4716107,
-1.1986316,
-1.5548207)
vector named tmp defined in the following code
chunk and store the result in a variable named ans_3a.tmp <- c(0.24, 0.015, 1.34, -1.00, -0.15)
list named tmp defined in the following code
chunk and store the result in a variable named ans_3b. Be
sure that the data type of ans_3b is
character.tmp <- list(0.5, "W", FALSE)
v2
from the data.frame named tmp defined in the
following code chunk and store the result in a variable named
ans_3c. Be sure that the line of code that you write
includes the $ operator.tmp <- data.frame(v1=c(1, 2, 3),
v2=c('A', 'B', 'C'),
v3=c(TRUE, TRUE, FALSE))
tmp? Store your answer in a variable named
ans_4a by copying, pasting, and uncommenting one of the
commented out lines of code below.element_1 <- "A"
element_2 <- 2
element_3 <- FALSE
tmp <- list(element_1, element_2, element_3)
# ans_4a <- "character"
# ans_4a <- "numeric"
# ans_4a <- "logical"
tmp? Store your answer in a variable named
ans_4b by copying, pasting, and uncommenting one of the
commented out lines of code below.element_1 <- "A"
element_2 <- 2
element_3 <- FALSE
tmp <- c(element_1, element_2, element_3)
# ans_4b <- "character"
# ans_4b <- "numeric"
# ans_4b <- "logical"
x <- 5
for(i in c(1, 3, 5)) {
x <- i + 2
}
What is the value of x after the loop has finished
executing?
What is the value of i after the loop has finished
executing?
Modify the code chunk such that the final value of x
is 10 and i is 20.
x <- 10
y <- c(1, 2, 3)
while(x > 5) {
y <- c(x, y)
x <- x - 1
}
How many elements does y contain after the loop has
finished?
What is the value of the final element in y and can
you figure this out without executing the code?
if(x > y) {
z <- 10
} else {
z <- 2
}
What is the value of z if x <- 10
and y <- -10
What is the value of z if x <- -5
and y <- 5
What is the value of z if x <- 0 and
y <- 0
for(i in 1:10) {
x <- i * 2
if(x > 5) {
break
}
}
How many times will this loop run?
What is the value of i and x when the
loop stops?
if(x == y) {
z <- 1
} else if(x > y) {
z <- 2
} else if(x < y) {
z <- 3
}
z if x <- 10 and
y <- -10z if x <- -5 and
y <- 5z if x <- 0 and
y <- 0f <- function(x) {
y <- x^2
return(y)
}
z <- f(3)
What are the values of x, y and
z?
f <- function(x) {
y <- x^2
return(y)
}
g <- function(x) {
y <- x - 2
return(y)
}
z <- f(g(4))
What are the values of x, y and
z?
f <- function(x, y, z) {
res <- x + y - z
return(res)
}
f(2, 1, -4) a valid way to call the function
f?f(y=2, 1, -4) a valid way to call the function
f?f(y=2, 1, -4) a wise way to call the
function f?f?