Overview

This tutorial is designed to get everyone operational in R and to build comfort with core R objects and control flow.


Install R and RStudio (before class, if possible)

If you already have both installed, you can skip this section.


R vs RStudio (important!)

A common confusion is thinking R and RStudio are the same thing. They are not.

R (the language)

  • R is the programming language (and the software that runs R code).

  • If you install R, you can run R code without RStudio.

RStudio (the IDE)

  • RStudio is an editor + interface (an IDE which stands for Interactive Development Environment).

  • It helps you write and organise R code, but it is not the language itself.

This unit teaches R. We use RStudio simply because many people find they enjoy using it. But you can use any editor or interface you like to write and run R code.


Running R without RStudio (base R)

You can run R in at least two common “no RStudio” ways.

Option A: Run R interactively in a Terminal

  1. Open a terminal (Mac: Terminal, Windows: PowerShell or Terminal, Linux: any terminal).

  2. Type R

  3. You will enter the R console where you can type things like (press enter after each line to execute):

1 + 1
x <- 10
x * 2
q()

Option B: Run an R script from a Terminal

  1. Create a plain text file named test_script.R containing:
x <- 10
y <- 5
z <- x + y
print(z)
  1. In a terminal, navigate to the directory where you saved test_script.R and run Rscript test_script.R

RStudio orientation (Optional)

By default, RStudio has a 4-pane layout.

The idea is to write code in the script, run it in the console, and see results in the environment and plots.

To run a line of code in the script, place your cursor on the line and press Ctrl + Enter (Windows) or Cmd + Enter (Mac). You can run multiple lines by selecting them and pressing the same shortcut.

You can restart your session and start from clean slate (clear all objects in the environment) by navigating to the session menu and clicking Restart R (or using the keyboard shortcut).


Part 1 — Variables and basic operations

# A numeric variable
x <- 10
y <- 5

# Basic arithmetic
z <- x + y
z
## [1] 15

# A character variable (text)
s1 <- "10"
s2 <- "5"

# Convert text to numeric
as.numeric(s1) + as.numeric(s2)
## [1] 15

# A logical variable (TRUE/FALSE)
x == y
## [1] FALSE
x > y
## [1] TRUE
x < y
## [1] FALSE

Inspect types

class(x)
## [1] "numeric"
class(s1)
## [1] "character"
class(x == y)
## [1] "logical"

Part 2 — Vectors, lists, and data.frames

Vectors

A vector is a 1D container where elements are usually the same type.

v_num <- c(2, 3, 5, 7, 11)
v_num
## [1]  2  3  5  7 11

v_chr <- c("a", "b", "c")
v_chr
## [1] "a" "b" "c"

v_log <- c(TRUE, FALSE, TRUE)
v_log
## [1]  TRUE FALSE  TRUE

Lists

A list can hold mixed types.

L <- list(1, 2, "a", "b", TRUE)
L
## [[1]]
## [1] 1
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] "a"
## 
## [[4]]
## [1] "b"
## 
## [[5]]
## [1] TRUE

data.frames

A data.frame is a table-like object where columns can have different types.

v1 <- c('I', 'I', 'I', 'I', 'II', 'II', 'II', 'II')
v2 <- c('a', 'a', 'b', 'b', 'c', 'c', 'd', 'd')
v3 <- c(-1.6297880,
        -1.0738506,
         0.0299236,
        -1.5435811,
        -0.5133278,
        -1.4716107,
        -1.1986316,
        -1.5548207)

d <- data.frame(v1=v1, v2=v2, v3=v3)
d
##   v1 v2         v3
## 1  I  a -1.6297880
## 2  I  a -1.0738506
## 3  I  b  0.0299236
## 4  I  b -1.5435811
## 5 II  c -0.5133278
## 6 II  c -1.4716107
## 7 II  d -1.1986316
## 8 II  d -1.5548207

Part 3 — Indexing and accessing elements

Vector indexing

tmp <- c(0.24, 0.015, 1.34, -1.00, -0.15)

tmp[1]   # first element
## [1] 0.24
tmp[3]   # third element
## [1] 1.34
tmp[c(1,3,5)]  # multiple elements
## [1]  0.24  1.34 -0.15

List indexing

Lists have two common indexing styles:

  • [[ ]] pulls out the element itself
  • [ ] keeps a list
tmp <- list(0.5, "W", FALSE)

tmp[2]     # returns a list of length 1
## [[1]]
## [1] "W"
tmp[[2]]   # returns the element itself
## [1] "W"

class(tmp[2])
## [1] "list"
class(tmp[[2]])
## [1] "character"

data.frame column access

tmp <- data.frame(v1=c(1, 2, 3),
                  v2=c('A', 'B', 'C'),
                  v3=c(TRUE, TRUE, FALSE))

tmp$v2
## [1] "A" "B" "C"
tmp[, "v2"]
## [1] "A" "B" "C"
tmp[ , 2]
## [1] "A" "B" "C"

Part 4 — Type coercion (important!)

R will sometimes coerce types (quietly).

element_1 <- "A"
element_2 <- 2
element_3 <- FALSE

L <- list(element_1, element_2, element_3)
V <- c(element_1, element_2, element_3)

L
## [[1]]
## [1] "A"
## 
## [[2]]
## [1] 2
## 
## [[3]]
## [1] FALSE
V
## [1] "A"     "2"     "FALSE"

class(L[[2]])
## [1] "numeric"
class(V[2])
## [1] "character"

Key idea: - Lists preserve types per element - Vectors coerce to a single type when needed


Part 5 — Control flow: for, while, if

for loops

x <- 5
for(i in c(1, 3, 5)) {
  x <- i + 2
}
x
## [1] 7
i
## [1] 5

while loops

x <- 10
y <- c(1, 2, 3)
while(x > 5) {
  y <- c(x, y)
  x <- x - 1
}
y
## [1]  6  7  8  9 10  1  2  3
length(y)
## [1] 8

if / else

x <- 10
y <- -10

if(x > y) {
  z <- 10
} else {
  z <- 2
}
z
## [1] 10

break

for(i in 1:10) {
  x <- i * 2
  if(x > 5) {
    break
  }
}
i
## [1] 3
x
## [1] 6

Part 6 — Writing and calling functions

Functions are a way to encapsulate code and make it reusable. They can help avoid repetition and improve readability. The idea is to define a function once, and then call it whenever you need to perform that task.

f <- function(x) {
  y <- x^2
  return(y)
}

f(3)
## [1] 9

Functions can call other functions.

g <- function(x) {
  y <- x - 2
  return(y)
}

f(g(4))
## [1] 4

Named arguments are allowed, but keep them readable.

h <- function(x, y, z) {
  res <- x + y - z
  return(res)
}

h(2, 1, -4)
## [1] 7
h(x=2, y=1, z=-4)
## [1] 7

Work through these practice exercises

1.

    1. Define a variable named ans_1a_x and set its value to 10, a variable named ans_1a_y and set its value to 5, and a variable named ans_1a_z and set its value to the sum of ans_1a_x and ans_1a_y. The line of code that you write to define ans_1a_z must include the + operator.
    1. Define a variable named ans_1b_x and set its value to "10", a variable named ans_1b_y and set its value to "5", and a variable named ans_1b_z and set its value to the sum of the numeric values of ans_1b_x and ans_1b_y. The line of code that you write to define ans_1b_z must include the + operator and the as.numeric() function.
    1. Define a variable named ans_1c and set its value to the logical result indicating if ans_1a_x is equal to ans_1a_y. The line of code that you write to define ans_1c must include the == operator.

2.

    1. Create a vector named ans_2a that contains the elements 2, 3, 5, 7, and 11.
    1. Create a list named ans_2b that contains the elements 1, 2, "a", "b" and TRUE.
    1. Create a data.frame named ans_2c that contains v1, v2, and v3 as defined below as columns. Make sure the column names of the data.frame you create are equal to the variable names v1, v2, and v3.
v1 <- c('I', 'I', 'I', 'I', 'II', 'II', 'II', 'II')
v2 <- c('a', 'a', 'b', 'b', 'c', 'c', 'd', 'd')
v3 <- c(-1.6297880,
        -1.0738506,
         0.0299236,
        -1.5435811,
        -0.5133278,
        -1.4716107,
        -1.1986316,
        -1.5548207)

3.

    1. Write a line of code that returns the third element from the vector named tmp defined in the following code chunk and store the result in a variable named ans_3a.
tmp <- c(0.24, 0.015, 1.34, -1.00, -0.15)
    1. Write a line of code that returns the second element from the list named tmp defined in the following code chunk and store the result in a variable named ans_3b. Be sure that the data type of ans_3b is character.
tmp <- list(0.5, "W", FALSE)
    1. Write a line of code that returns the column named v2 from the data.frame named tmp defined in the following code chunk and store the result in a variable named ans_3c. Be sure that the line of code that you write includes the $ operator.
tmp <- data.frame(v1=c(1, 2, 3),
                  v2=c('A', 'B', 'C'),
                  v3=c(TRUE, TRUE, FALSE))

4.

    1. For the following code chunk, what is the data type of the second element of tmp? Store your answer in a variable named ans_4a by copying, pasting, and uncommenting one of the commented out lines of code below.
element_1 <- "A"
element_2 <- 2
element_3 <- FALSE
tmp <- list(element_1, element_2, element_3)

# ans_4a <- "character"
# ans_4a <- "numeric"
# ans_4a <- "logical"
    1. For the following code chunk, what is the data type of the second element of tmp? Store your answer in a variable named ans_4b by copying, pasting, and uncommenting one of the commented out lines of code below.
element_1 <- "A"
element_2 <- 2
element_3 <- FALSE
tmp <- c(element_1, element_2, element_3)

# ans_4b <- "character"
# ans_4b <- "numeric"
# ans_4b <- "logical"

5.

    1. Consider the following code chunk:
x <- 5
for(i in c(1, 3, 5)) {
  x <- i + 2
}
  • What is the value of x after the loop has finished executing?

  • What is the value of i after the loop has finished executing?

  • Modify the code chunk such that the final value of x is 10 and i is 20.

    1. Consider the following code chunk:
x <- 10
y <- c(1, 2, 3)
while(x > 5) {
  y <- c(x, y)
  x <- x - 1
}
  • How many elements does y contain after the loop has finished?

  • What is the value of the final element in y and can you figure this out without executing the code?

    1. Consider the following code chunk:
if(x > y) {
  z <- 10
} else {
  z <- 2
}
  • What is the value of z if x <- 10 and y <- -10

  • What is the value of z if x <- -5 and y <- 5

  • What is the value of z if x <- 0 and y <- 0

    1. Consider the following code chunk:
for(i in 1:10) {
  x <- i * 2
  if(x > 5) {
    break
  }
}
  • How many times will this loop run?

  • What is the value of i and x when the loop stops?

    1. consider the following code chunk:
if(x == y) {
  z <- 1
} else if(x > y) {
  z <- 2
} else if(x < y) {
  z <- 3
}
  • What is the value of z if x <- 10 and y <- -10
  • What is the value of z if x <- -5 and y <- 5
  • What is the value of z if x <- 0 and y <- 0

6.

    1. Consider the following code chunk:
f <- function(x) {
  y <- x^2
  return(y)
}
z <- f(3)
  • What are the values of x, y and z?

    1. Consider the following code chunk:
f <- function(x) {
  y <- x^2
  return(y)
}

g <- function(x) {
  y <- x - 2
  return(y)
}

z <- f(g(4))
  • What are the values of x, y and z?

    1. Consider the following code chunk:
f <- function(x, y, z) {
  res <- x + y - z
  return(res)
}
  • Is f(2, 1, -4) a valid way to call the function f?
  • Is f(y=2, 1, -4) a valid way to call the function f?
  • Is f(y=2, 1, -4) a wise way to call the function f?
  • What is the wisest way to call the function f?