8 * 3
[1] 24
In this activity we’ll start to learn how to work with R and explore some of the syntax and functions that we encountered in the video.
We enter commands at the prompt >
in the R Console. R then
For example, in the code block below we run multiply two numbers
8 * 3
[1] 24
R prints the result because we did not assign the output to an object.
We can assign the output to an object using the <-
assignment operator. In the block below, we repeat the multiplication but this time we assign the output to an object named answer
<- 8 * 3 answer
Now R doesn’t print anything. To retrieve the answer to our multiplication we must type the name of the object and hit Enter
answer
[1] 24
Look at the code below; what answer will this produce and does anything get printed to the console when you run it?
2 * 3
2 * 3
[1] 6
The answer is 6, and nothing gets printed at the console
Look at the code below; what answer will this produce?
<- log(2 +3) my_log
<- log(2 +3)
my_log my_log
[1] 1.609438
The answer is 1.609.
At the most basic level, we create vectors by combining elements together using the c()
function. For example, to create a vector of five numbers we use
<- c(2, 10, 3, 8, 12) v
What kind of vector is v
?
class(v)
[1] "numeric"
typeof(v)
[1] "double"
is.integer(v)
[1] FALSE
This was a bit of a trick question! If you said "numeric"
then you were right. It doesn’t matter that we combined integers when creating the vector, R treats them as potentially have decimal places.
If you want to insert integers, you need to add L
after each number:
<- c(2L, 10L, 3L, 8L, 12L)
i is.integer(i)
[1] TRUE
We can also coerce a numeric vector to an integer one using as.integer()
<- as.integer(v)
vi is.integer(vi)
[1] TRUE
But beware what happens if you coerce numbers that are not integers this way
as.integer(1.4)
[1] 1
Create a vector named primes
that contains the first 5 prime numbers.
<- c(2, 3, 5, 7, 11) primes
If you include 1
in primes
, unfortunately it is not prime; for a number to be prime it must have only two prime factors, 1 and itself.
Create a vector named fib
that contains the first 10 Fibonacci numbers.
<- c(0, 1, 1, 2, 3, 5, 8, 13, 21, 34) fib
We can create other types of vector; we discussed
logical vectors, and
character vectors
in the video.
We create a logical vector by combining the elements TRUE
and FALSE
;
<- c(TRUE, FALSE, TRUE, TRUE) l
More conveniently, we can create a logical vector through the application of a Boolean operator. For example if we run
> 3 fib
[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE
what we get back is a logical vector where the elements are TRUE
if the Fibonacci number is greater than 3, and FALSE
otherwise.
Create a vector named prime_10
that contains the first 10 prime numbers. Then create a logical vector that indicates if each of the primes is less than 17.
<- c(2, 3, 5, 7, 11, 13, 17, 19, 23, 29)
primes_10 < 17 primes_10
[1] TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE FALSE
Character vectors are created by wrapping the elements in single or double quotes 'a'
, or "b"
. Double quotes "
are preferred as they are slightly easier to read in code.
We create a character vector like any other using c()
<- c("Alice", "Bob", "Claire", "David", "Else") txt
We always enter quotes in pairs; the RStudio IDE will help you in this regard as it will automatically add the closing quote for you when you type the first one.
If you ever get stuck because you forgot to close a quote, click in the R Console window and hit the Esc key to get back to your prompt
Create a character vector named my_group
that contains the first name of each member of your group, including your own.
<- c("Mona", "Gavin") my_group
We’re a small group!
rep()
and seq()
We can create sequences of numbers using the seq()
function and the :
operator. 1:10
is shorthand for seq(from = 1, to = 10, by = 1L)
.
To create a sequence of numbers from 5 to 14 we could use
5:14
[1] 5 6 7 8 9 10 11 12 13 14
or
seq(4, 14)
[1] 4 5 6 7 8 9 10 11 12 13 14
Create a vector named my_seq
containing the integers -4 – 9.
<- seq(-4, 9) my_seq
or
<- -4:9
my_seq my_seq
[1] -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9
Create a vector named countdown
containing the integers 10 to 0.
<- seq(10, 0)
countdown countdown
[1] 10 9 8 7 6 5 4 3 2 1 0
or
<- 10:0
countdown countdown
[1] 10 9 8 7 6 5 4 3 2 1 0
We can make more complex sequences using other arguments to seq()
: by
and length.out
We usually shorten the length.out
argument name to just length
.
To create a vector of first 10 positive odd numbers, we could use
seq(1, by = 2, length = 10)
[1] 1 3 5 7 9 11 13 15 17 19
Note that we don’t have to use both from
and to
when we call seq()
.
Create a vector named x
that contains 15 evenly-spaced numbers between -4 and 20.
<- seq(-4, 20, length.out = 15)
x x
[1] -4.0000000 -2.2857143 -0.5714286 1.1428571 2.8571429 4.5714286
[7] 6.2857143 8.0000000 9.7142857 11.4285714 13.1428571 14.8571429
[13] 16.5714286 18.2857143 20.0000000
Create a vector named tens
that contains every ten between 10 and 140
<- seq(10, 140, by = 10)
tens tens
[1] 10 20 30 40 50 60 70 80 90 100 110 120 130 140
The rep()
function allows us to create patterned vectors where we repeat the elements of one vector to create a longer vector. The key arguments are each
and times
, but there is also the length.out
argument too.
Let’s start with a vector base
with the following elements
<- c(1, 4, 8) base
If we want to repeat this vector 3 times, we would use
rep(base, times = 3)
[1] 1 4 8 1 4 8 1 4 8
whereas, is we wanted to repeat each element of base
3 times, we would use
rep(base, each = 3)
[1] 1 1 1 4 4 4 8 8 8
To repeat a vector until it has the desired length, we can use the length.out
argument
rep(base, length = 10)
[1] 1 4 8 1 4 8 1 4 8 1
Notice how we get an incomplete final replicate of base
because 3 elements to not divide into 10 cleanly.
How you you repeat each element of the Fibonacci number vector fib
twice?
rep(fib, each = 2)
[1] 0 0 1 1 1 1 2 2 3 3 5 5 8 8 13 13 21 21 34 34
Create a vector that repeats the first 10 primes 4 times.
rep(primes_10, times = 4)
[1] 2 3 5 7 11 13 17 19 23 29 2 3 5 7 11 13 17 19 23 29 2 3 5 7 11
[26] 13 17 19 23 29 2 3 5 7 11 13 17 19 23 29
What do you think will be the output of the code block below?
rep(3:1, times = 1:3)
Challenge yourself: think about what the code would return before running it.
rep(3:1, times = 1:3)
[1] 3 2 2 1 1 1
We can subset vectors in one of several ways. The most common way is to specify the indices of the elements you want to return using a numeric vector. For example, to select the third prime number from our vector of the first 10 primes we would use
3] primes_10[
[1] 5
What is the 5th Fibonacci number in fib
?
5] fib[
[1] 3
What is the 9th Fibonacci number in fib
?
9] fib[
[1] 21
We can use negative number to exclude elements, for example
c(-2, -4, -6)] primes_10[
[1] 2 5 11 17 19 23 29
Subset the vector of Fibonacci numbers fib
to return only the even elements
c(2, 4, 6, 8, 10)] fib[
[1] 1 2 5 13 34
Subset the vector of Fibonacci numbers fib
to exclude the odd elements
-c(1, 3, 5, 7, 9)] fib[
[1] 1 2 5 13 34
Logical vectors are handy ways to subset vectors. For example, if we have a vector 1:10
, we can select the even elements using
<- 1:10
v <- v %% 2 == 0
take take
[1] FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE FALSE TRUE
v[take]
[1] 2 4 6 8 10
The %%
operator is a new one for us. It is the modulo operator, and, given two numbers a
and b
returns the remainder of the division of a
by b
. Often we write this as “a mod b”. The definition of an even number is that when it is divided by 2 there is zero remainder. Hence we compute the remainder when dividing by 2 and compare (==
) that with 0.
How would you create a vector containing all the odd numbers between 42 and 121?
<- 42:121
num <- num %% 2 == 1L
take <- num[take]
odds odds
[1] 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79
[20] 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117
[39] 119 121
We don’t have to create all the intermediary steps shown in the solution. We could just as easily do
<- 42:121
odds <- odds[odds %% 2 == 1L]
odds odds
[1] 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79
[20] 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117
[39] 119 121
at the expense of being a little harder to read and understand if you are new to R coding.
Say you have a vector containing the first n
Fibonacci numbers, but you can’t remember how many of the Fibonacci numbers are in your vector. You do know that the vector is named fib
, that the elements are in increasing order. You want to know which is the largest Fibonacci number in your vector. How would you do this?
length(fib)] fib[
[1] 34
Functions are a collection of R statements (lines of code) that are bundled together to complete a specific operation. In the video we encountered several functions.
We used the length()
function and sum()
was used to sum a vector of numbers. R contains many more simple functions like this:
max()
min()
log()
exp()
sqrt()
mean()
median()
range()
IQR()
(inter-quartile range)What is the sum of the first 10 prime numbers?
sum(primes_10)
[1] 129
What is the product of the first 10 Fibonacci numbers?
prod(fib)
[1] 0
One of the functions we used was runif()
, to generate random numbers. By default the numbers lie between 0 and 1. If we run the functions multiple times we will get different sets of random numbers
runif(10)
[1] 0.1774133 0.6150985 0.1783983 0.2827965 0.4564735 0.4252319 0.4459273
[8] 0.4803467 0.4139307 0.2241565
runif(10)
[1] 0.04730761 0.80663297 0.38028540 0.05635062 0.14624077 0.55911343
[7] 0.04494480 0.62993577 0.79132926 0.66436722
We can use the set.seed()
function to make the values returned by R’s random number generator repeatable
set.seed(42)
runif(10)
[1] 0.9148060 0.9370754 0.2861395 0.8304476 0.6417455 0.5190959 0.7365883
[8] 0.1346666 0.6569923 0.7050648
set.seed(42)
runif(10)
[1] 0.9148060 0.9370754 0.2861395 0.8304476 0.6417455 0.5190959 0.7365883
[8] 0.1346666 0.6569923 0.7050648
Using the seed 65
, what is the largest (in value) number in vector returned by
runif(25)
[1] 0.457741776 0.719112252 0.934672247 0.255428824 0.462292823 0.940014523
[7] 0.978226428 0.117487362 0.474997082 0.560332746 0.904031387 0.138710168
[13] 0.988891729 0.946668233 0.082437558 0.514211784 0.390203467 0.905738131
[19] 0.446969628 0.836004260 0.737595618 0.811055141 0.388108283 0.685169729
[25] 0.003948339
set.seed(65)
runif(25) |>
max()
[1] 0.9363114
A related function is sample()
, which at it’s most simplest, it randomizes a vector that you pass it.
sample(letters[1:10])
[1] "a" "i" "h" "b" "e" "d" "j" "g" "c" "f"
Again, we need to set a seed if we want to make the randomization repeatable
set.seed(24)
sample(letters[1:10])
[1] "g" "c" "h" "j" "b" "i" "a" "f" "e" "d"
set.seed(24)
sample(letters[1:10])
[1] "g" "c" "h" "j" "b" "i" "a" "f" "e" "d"
We don’t need to just randomise with sample()
, we can ask it to return a random sample of a few of the values. Say we want to return a random sample of size 10 from the letters of the English alphabet. We would use
set.seed(12)
sample(letters, 10)
[1] "b" "p" "w" "n" "e" "v" "z" "h" "r" "f"
You can use any number as your seed. Just remember to set one if you need to make your code reproducible
set.seed(3345871348710923)
Create a vector of the first 1000 positive integers. Using the seed 84
, take a random sample of 50 of these numbers. Answer the following questions
Think about your answers to all of these questions and write the necessary code, before looking at the solution below.
<- 1:1000
integers
set.seed(84)
<- sample(integers, 50)
s
# What is the largest integer in your sample?
max(s)
[1] 943
# What is the smallest integer in your sample?
min(s)
[1] 11
# What is the median value?
median(s)
[1] 474.5
# What is the arithmetic mean of the values?
mean(s)
[1] 491.36
# What is the sum of the values?
sum(s)
[1] 24568
# What is the 10th value in your sample?
10] s[
[1] 391
# What is the 49th value in your sample?
49] s[
[1] 797
# How many odd integers are in your sample?
%% 2 == 1] |>
s[s length()
[1] 32
# How many even integers are in your sample?
%% 2 == 2] |>
s[s length()
[1] 0
# What is the interquartile range of your sample?
IQR(s)
[1] 513.75
How many did you get right?
Work your way through the following questions to test your growing knowledge of R.
Determine the value of \frac{1.35^2 + 4.2^3}{4}.
Note that we use the ^
operator to raise a number to a power, e.g. 2^3
for 2
cubed.
1.35^2 + 4.2^3) / 4 (
[1] 18.97763
You want to divide the number 4 by 2. What R code would you use to achieve this?
4 / 2
[1] 2
What will be the result of the following code
<- 9
a <- 3
b / b A
<- 9
a <- 3
b / b A
Error: object 'A' not found
How would you fix this code to give the correct result?
Run the code below
set.seed(10)
<- runif(1) x
Round x
to 3 decimal places.
set.seed(10)
<- runif(1)
x
round(x, 3)
[1] 0.507
What is the average of the values c(34, 12, 5, -10)
?
mean(c(34, 12, 5, -10))
[1] 10.25
What is the median of the values c(1, 8, 100, 10000, 342, -125, 5, -10)
?
median(c(1, 8, 100, 10000, 342, -125, 5, -10))
[1] 6.5
What does the function rnorm()
do?
It is a random number generator for the normal distribution with mean equal to mean
and standard deviations equal to sd
.
To find this out you could have done
?rnorm
How many arguments does the rt()
function have?
The rt()
function has three arguments:
args(rt)
function (n, df, ncp)
NULL
To find this out you could have done
?rt