Introduction to R

In this activity we’ll start to learn how to work with R and explore some of the syntax and functions that we encountered in the video.

Basic R operations

We enter commands at the prompt > in the R Console. R then

  1. interprets the code we have entered,
  2. executes the interpreted code,
  3. displays the output of the code if it should be printed

For example, in the code block below we run multiply two numbers

8 * 3
[1] 24

R prints the result because we did not assign the output to an object.

We can assign the output to an object using the <- assignment operator. In the block below, we repeat the multiplication but this time we assign the output to an object named answer

answer <- 8 * 3

Now R doesn’t print anything. To retrieve the answer to our multiplication we must type the name of the object and hit Enter

answer
[1] 24
Question

Look at the code below; what answer will this produce and does anything get printed to the console when you run it?

2 * 3
2 * 3
[1] 6

The answer is 6, and nothing gets printed at the console

Question

Look at the code below; what answer will this produce?

my_log <- log(2 +3)
my_log <- log(2 +3)
my_log
[1] 1.609438

The answer is 1.609.

Vectors

At the most basic level, we create vectors by combining elements together using the c() function. For example, to create a vector of five numbers we use

v <- c(2, 10, 3, 8, 12)
Question

What kind of vector is v?

class(v)
[1] "numeric"
typeof(v)
[1] "double"
is.integer(v)
[1] FALSE

This was a bit of a trick question! If you said "numeric" then you were right. It doesn’t matter that we combined integers when creating the vector, R treats them as potentially have decimal places.

If you want to insert integers, you need to add L after each number:

i <- c(2L, 10L, 3L, 8L, 12L)
is.integer(i)
[1] TRUE

We can also coerce a numeric vector to an integer one using as.integer()

vi <- as.integer(v)
is.integer(vi)
[1] TRUE

But beware what happens if you coerce numbers that are not integers this way

as.integer(1.4)
[1] 1
Question

Create a vector named primes that contains the first 5 prime numbers.

primes <- c(2, 3, 5, 7, 11)

If you include 1 in primes, unfortunately it is not prime; for a number to be prime it must have only two prime factors, 1 and itself.

Question

Create a vector named fib that contains the first 10 Fibonacci numbers.

fib <- c(0, 1, 1, 2, 3, 5, 8, 13, 21, 34)

We can create other types of vector; we discussed

  • logical vectors, and

  • character vectors

in the video.

Logical vectors

We create a logical vector by combining the elements TRUE and FALSE;

l <- c(TRUE, FALSE, TRUE, TRUE)

More conveniently, we can create a logical vector through the application of a Boolean operator. For example if we run

fib > 3
 [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE

what we get back is a logical vector where the elements are TRUE if the Fibonacci number is greater than 3, and FALSE otherwise.

Question

Create a vector named prime_10 that contains the first 10 prime numbers. Then create a logical vector that indicates if each of the primes is less than 17.

primes_10 <- c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
primes_10 < 17
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE

Character vectors

Character vectors are created by wrapping the elements in single or double quotes 'a', or "b". Double quotes " are preferred as they are slightly easier to read in code.

We create a character vector like any other using c()

txt <- c("Alice", "Bob", "Claire", "David", "Else")

We always enter quotes in pairs; the RStudio IDE will help you in this regard as it will automatically add the closing quote for you when you type the first one.

Tip

If you ever get stuck because you forgot to close a quote, click in the R Console window and hit the Esc key to get back to your prompt

Question

Create a character vector named my_group that contains the first name of each member of your group, including your own.

my_group <- c("Mona", "Gavin")

We’re a small group!

Using rep() and seq()

We can create sequences of numbers using the seq() function and the : operator. 1:10 is shorthand for seq(from = 1, to = 10, by = 1L).

To create a sequence of numbers from 5 to 14 we could use

5:14
 [1]  5  6  7  8  9 10 11 12 13 14

or

seq(4, 14)
 [1]  4  5  6  7  8  9 10 11 12 13 14
Question

Create a vector named my_seq containing the integers -4 – 9.

my_seq <- seq(-4, 9)

or

my_seq <- -4:9
my_seq
 [1] -4 -3 -2 -1  0  1  2  3  4  5  6  7  8  9
Question

Create a vector named countdown containing the integers 10 to 0.

countdown <- seq(10, 0)
countdown
 [1] 10  9  8  7  6  5  4  3  2  1  0

or

countdown <- 10:0
countdown
 [1] 10  9  8  7  6  5  4  3  2  1  0

We can make more complex sequences using other arguments to seq(): by and length.out

Tip

We usually shorten the length.out argument name to just length.

To create a vector of first 10 positive odd numbers, we could use

seq(1, by = 2, length = 10)
 [1]  1  3  5  7  9 11 13 15 17 19
Tip

Note that we don’t have to use both from and to when we call seq().

Question

Create a vector named x that contains 15 evenly-spaced numbers between -4 and 20.

x <- seq(-4, 20, length.out = 15)
x
 [1] -4.0000000 -2.2857143 -0.5714286  1.1428571  2.8571429  4.5714286
 [7]  6.2857143  8.0000000  9.7142857 11.4285714 13.1428571 14.8571429
[13] 16.5714286 18.2857143 20.0000000
Question

Create a vector named tens that contains every ten between 10 and 140

tens <- seq(10, 140, by = 10)
tens
 [1]  10  20  30  40  50  60  70  80  90 100 110 120 130 140

The rep() function allows us to create patterned vectors where we repeat the elements of one vector to create a longer vector. The key arguments are each and times, but there is also the length.out argument too.

Let’s start with a vector base with the following elements

base <- c(1, 4, 8)

If we want to repeat this vector 3 times, we would use

rep(base, times = 3)
[1] 1 4 8 1 4 8 1 4 8

whereas, is we wanted to repeat each element of base 3 times, we would use

rep(base, each = 3)
[1] 1 1 1 4 4 4 8 8 8

To repeat a vector until it has the desired length, we can use the length.out argument

rep(base, length = 10)
 [1] 1 4 8 1 4 8 1 4 8 1

Notice how we get an incomplete final replicate of base because 3 elements to not divide into 10 cleanly.

Question

How you you repeat each element of the Fibonacci number vector fib twice?

rep(fib, each = 2)
 [1]  0  0  1  1  1  1  2  2  3  3  5  5  8  8 13 13 21 21 34 34
Question

Create a vector that repeats the first 10 primes 4 times.

rep(primes_10, times = 4)
 [1]  2  3  5  7 11 13 17 19 23 29  2  3  5  7 11 13 17 19 23 29  2  3  5  7 11
[26] 13 17 19 23 29  2  3  5  7 11 13 17 19 23 29
Question

What do you think will be the output of the code block below?

rep(3:1, times = 1:3)

Challenge yourself: think about what the code would return before running it.

rep(3:1, times = 1:3)
[1] 3 2 2 1 1 1

Subsetting vectors

We can subset vectors in one of several ways. The most common way is to specify the indices of the elements you want to return using a numeric vector. For example, to select the third prime number from our vector of the first 10 primes we would use

primes_10[3]
[1] 5
Question

What is the 5th Fibonacci number in fib?

fib[5]
[1] 3
Question

What is the 9th Fibonacci number in fib?

fib[9]
[1] 21

We can use negative number to exclude elements, for example

primes_10[c(-2, -4, -6)]
[1]  2  5 11 17 19 23 29
Question

Subset the vector of Fibonacci numbers fib to return only the even elements

fib[c(2, 4, 6, 8, 10)]
[1]  1  2  5 13 34
Question

Subset the vector of Fibonacci numbers fib to exclude the odd elements

fib[-c(1, 3, 5, 7, 9)]
[1]  1  2  5 13 34

Logical vectors are handy ways to subset vectors. For example, if we have a vector 1:10, we can select the even elements using

v <- 1:10
take <- v %% 2 == 0
take
 [1] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
v[take]
[1]  2  4  6  8 10
Tip

The %% operator is a new one for us. It is the modulo operator, and, given two numbers a and b returns the remainder of the division of a by b. Often we write this as “a mod b”. The definition of an even number is that when it is divided by 2 there is zero remainder. Hence we compute the remainder when dividing by 2 and compare (==) that with 0.

Question

How would you create a vector containing all the odd numbers between 42 and 121?

num <- 42:121
take <- num %% 2 == 1L
odds <- num[take]
odds
 [1]  43  45  47  49  51  53  55  57  59  61  63  65  67  69  71  73  75  77  79
[20]  81  83  85  87  89  91  93  95  97  99 101 103 105 107 109 111 113 115 117
[39] 119 121

We don’t have to create all the intermediary steps shown in the solution. We could just as easily do

odds <- 42:121
odds <- odds[odds %% 2 == 1L]
odds
 [1]  43  45  47  49  51  53  55  57  59  61  63  65  67  69  71  73  75  77  79
[20]  81  83  85  87  89  91  93  95  97  99 101 103 105 107 109 111 113 115 117
[39] 119 121

at the expense of being a little harder to read and understand if you are new to R coding.

Question

Say you have a vector containing the first n Fibonacci numbers, but you can’t remember how many of the Fibonacci numbers are in your vector. You do know that the vector is named fib, that the elements are in increasing order. You want to know which is the largest Fibonacci number in your vector. How would you do this?

fib[length(fib)]
[1] 34

Functions

Functions are a collection of R statements (lines of code) that are bundled together to complete a specific operation. In the video we encountered several functions.

We used the length() function and sum() was used to sum a vector of numbers. R contains many more simple functions like this:

  • max()
  • min()
  • log()
  • exp()
  • sqrt()
  • mean()
  • median()
  • range()
  • IQR() (inter-quartile range)
Question

What is the sum of the first 10 prime numbers?

sum(primes_10)
[1] 129
Question

What is the product of the first 10 Fibonacci numbers?

prod(fib)
[1] 0

One of the functions we used was runif(), to generate random numbers. By default the numbers lie between 0 and 1. If we run the functions multiple times we will get different sets of random numbers

runif(10)
 [1] 0.1774133 0.6150985 0.1783983 0.2827965 0.4564735 0.4252319 0.4459273
 [8] 0.4803467 0.4139307 0.2241565
runif(10)
 [1] 0.04730761 0.80663297 0.38028540 0.05635062 0.14624077 0.55911343
 [7] 0.04494480 0.62993577 0.79132926 0.66436722

We can use the set.seed() function to make the values returned by R’s random number generator repeatable

set.seed(42)
runif(10)
 [1] 0.9148060 0.9370754 0.2861395 0.8304476 0.6417455 0.5190959 0.7365883
 [8] 0.1346666 0.6569923 0.7050648
set.seed(42)
runif(10)
 [1] 0.9148060 0.9370754 0.2861395 0.8304476 0.6417455 0.5190959 0.7365883
 [8] 0.1346666 0.6569923 0.7050648
Question

Using the seed 65, what is the largest (in value) number in vector returned by

runif(25)
 [1] 0.457741776 0.719112252 0.934672247 0.255428824 0.462292823 0.940014523
 [7] 0.978226428 0.117487362 0.474997082 0.560332746 0.904031387 0.138710168
[13] 0.988891729 0.946668233 0.082437558 0.514211784 0.390203467 0.905738131
[19] 0.446969628 0.836004260 0.737595618 0.811055141 0.388108283 0.685169729
[25] 0.003948339
set.seed(65)
runif(25) |>
  max()
[1] 0.9363114

A related function is sample(), which at it’s most simplest, it randomizes a vector that you pass it.

sample(letters[1:10])
 [1] "a" "i" "h" "b" "e" "d" "j" "g" "c" "f"

Again, we need to set a seed if we want to make the randomization repeatable

set.seed(24)
sample(letters[1:10])
 [1] "g" "c" "h" "j" "b" "i" "a" "f" "e" "d"
set.seed(24)
sample(letters[1:10])
 [1] "g" "c" "h" "j" "b" "i" "a" "f" "e" "d"

We don’t need to just randomise with sample(), we can ask it to return a random sample of a few of the values. Say we want to return a random sample of size 10 from the letters of the English alphabet. We would use

set.seed(12)
sample(letters, 10)
 [1] "b" "p" "w" "n" "e" "v" "z" "h" "r" "f"
Tip

You can use any number as your seed. Just remember to set one if you need to make your code reproducible

set.seed(3345871348710923)
Question

Create a vector of the first 1000 positive integers. Using the seed 84, take a random sample of 50 of these numbers. Answer the following questions

  • What is the largest integer in your sample?
  • What is the smallest integer in your sample?
  • What is the median value?
  • What is the arithmetic mean of the values?
  • What is the sum of the values?
  • What is the 10th value in your sample?
  • What is the 49th value in your sample?
  • How many odd integers are in your sample?
  • How many even integers are in your sample?
  • What is the interquartile range of your sample?

Think about your answers to all of these questions and write the necessary code, before looking at the solution below.

integers <- 1:1000

set.seed(84)
s <- sample(integers, 50)

# What is the largest integer in your sample?
max(s)
[1] 943
# What is the smallest integer in your sample?
min(s)
[1] 11
# What is the median value?
median(s)
[1] 474.5
# What is the arithmetic mean of the values?
mean(s)
[1] 491.36
# What is the sum of the values?
sum(s)
[1] 24568
# What is the 10th value in your sample?
s[10]
[1] 391
# What is the 49th value in your sample?
s[49]
[1] 797
# How many odd integers are in your sample?
s[s %% 2 == 1] |>
  length()
[1] 32
# How many even integers are in your sample?
s[s %% 2 == 2] |>
  length()
[1] 0
# What is the interquartile range of your sample?
IQR(s)
[1] 513.75

How many did you get right?

Additional activities

Work your way through the following questions to test your growing knowledge of R.

Question

Determine the value of \frac{1.35^2 + 4.2^3}{4}.

Note that we use the ^ operator to raise a number to a power, e.g. 2^3 for 2 cubed.

(1.35^2 + 4.2^3) / 4
[1] 18.97763
Question

You want to divide the number 4 by 2. What R code would you use to achieve this?

4 / 2
[1] 2
Question

What will be the result of the following code

a <- 9
b <- 3
A / b
a <- 9
b <- 3
A / b
Error: object 'A' not found

How would you fix this code to give the correct result?

Question

Run the code below

set.seed(10)
x <- runif(1)

Round x to 3 decimal places.

set.seed(10)
x <- runif(1)

round(x, 3)
[1] 0.507
Question

What is the average of the values c(34, 12, 5, -10)?

mean(c(34, 12, 5, -10))
[1] 10.25
Question

What is the median of the values c(1, 8, 100, 10000, 342, -125, 5, -10)?

median(c(1, 8, 100, 10000, 342, -125, 5, -10))
[1] 6.5
Question

What does the function rnorm() do?

It is a random number generator for the normal distribution with mean equal to mean and standard deviations equal to sd.

To find this out you could have done

?rnorm
Question

How many arguments does the rt() function have?

The rt() function has three arguments:

args(rt)
function (n, df, ncp) 
NULL

To find this out you could have done

?rt