Learning objectives
At the end of this topic you should be able to
Understand the main features of the RStudio IDE
Run simple R commands in RStudio
Understand the basic syntax of R
Understand how to use the R help system
R
R is a powerful software application for statistical analysis
It is incredibly popular
It is open source — GPL
Vast package ecosystem
Designed from the ground up for analysing data
Has excellent graphics capabilities
R is an interpreted language unlike C, C++, etc
Slower but more forgiving and interactive
RStudio
RStudio is a powerful integrated development environment (IDE ) for R
an interface for running R
an editor for writing R scripts
menus & buttons to run common tasks
a lot more
It is also open source
RStudio ≠ R
Can run RStudio on your computer or in the cloud using posit.cloud
RStudio PBC provide paid-for support & Pro-level versions for organisations
RStudio
R example
# Palmer penguins
# Load some packages
library ("palmerpenguins" )
library ("dplyr" )
library ("ggplot2" )
# how many observations of each species of penguin?
penguins |>
count (species)
# A tibble: 3 × 2
species n
<fct> <int>
1 Adelie 152
2 Chinstrap 68
3 Gentoo 124
R example
penguins |>
group_by (species) |>
summarize (across (where (is.numeric), mean, na.rm = TRUE ))
# A tibble: 3 × 6
species bill_length_mm bill_depth_mm flipper_length_mm body_mass_g year
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Adelie 38.8 18.3 190. 3701. 2008.
2 Chinstrap 48.8 18.4 196. 3733. 2008.
3 Gentoo 47.5 15.0 217. 5076. 2008.
R example
ggplot (penguins, aes (x = flipper_length_mm,
y = body_mass_g,
colour = species,
shape = species)) +
geom_point (size = 3 ) +
scale_colour_brewer (palette = "Set1" )
R example
Don’t worry! You won’t understand most of that!
By the end of the course you will
Assignment
<-
is the assignment operator
Made up from the <
and -
characters
output <- input
Assign the result of the right hand side to the object named on the left
This creates an object with name output
Refer to objects using their name
Data types
The main data types in R are
numeric
integer
double (real values)
complex (numbers with real & imaginary parts)
character
strings of letters, numbers, etc
create with matched single '
or double "
quotes
logical
Never use T
and F
in their place!
TRUE
& FALSE
are reserved words in R — can’t be overwritten — but T
and F
aren’t
T <- FALSE # you monster!
T == TRUE
Operators
As well as <-
R has many operators
Boolean
<
and >
<=
and >=
(<
=
& >
=
)
==
(=
=
)
!=
(!
=
)
&
AND
|
OR
!
NOT
Getting help
Can get help on R from many places
Inside R use ?topic
to get help on topic topic
Usually topic
is a function
Can search more broadly with ??topic
Other sources:
Vectors
Vectors are the fundamental way that data are stored in R
R doesn’t have scalars — single values — just vectors
Vectors are a one-dimensional collection of values in a single unit
(But see lists later in the course )
Atomic vectors are vectors whose elements are all of the same type
Creating vectors
Create vectors with c()
(for combine )
numbers <- c (1 , 4 , 6 , 10 )
numbers
people <- c ("Alice" , "Bob" , "Claire" , "David" )
people
[1] "Alice" "Bob" "Claire" "David"
Number of elements via length()
Creating vectors
Many other ways: seq()
, rep()
[1] 1.00 1.25 1.50 1.75 2.00
Vectorized operations
Vectors are a power feature of R as they allow us to write more expressive code
v1 <- c (3 , 1 , 4 , 1 , 5 )
v2 <- c (1 , 6 , 1 , 8 , 0 )
v1 + v2
In other languages, to achieve this you might have to loop (iterate) over the indices of the vectors to add each pair of elements in turn
We’ll talk more about loops and iteration later in the course
Recycling
What if we have vectors of different lengths?
v1 <- c (1 , 3 , 5 , 1 , 5 )
v2 <- c (1 , 2 )
v1 + v2
Warning in v1 + v2: longer object length is not a multiple of shorter object
length
v2
is recycled until it is of the correct length
Dangerous & powerful — best avoided
Working with data frames helps avoid this
Recycling
Vectorized functions
Most functions in R accept vectors as inputs
v1 <- c (10 , 5 , 2 , 4 )
sum (v1)
round (v1 + runif (length (v1)), 2 )
Indexing vectors
Having stored data in a vector we might want to access certain elements of the vector
Use [
plus a vector of indices to access elements of a vector
Can also use negative indices to exclude those elements
Indexing vectors
If we give the elements of the vector names we can index using those
Alice Bob Claire David
10 5 2 4
Indexing vectors
We can also use a logical vector to select (TRUE
) or exclude (FALSE
) elements
Alice Bob Claire David
10 5 2 4
filt <- rep (c (TRUE , FALSE ), each = 2 )
filt
[1] TRUE TRUE FALSE FALSE
Indexing vectors
Any expression that evaluates to
numeric (possibly negative)
character (assuming named)
logical
can be used to index a vector
Alice Bob Claire David
10 5 2 4
Can also assign new values to elements
Alice Bob Claire David
10 5 2 15
Alice Bob Claire David
FALSE TRUE TRUE FALSE
Functions
A function is
a sequence of 1 or more instructions (lines of code)
takes 0 or more arguments
returns something (possibly nothing or NULL
, may be invisibly)
seq()
, length()
etc are all functions
Arguments
Functions typically take arguments — like flags for the CLI commands
v <- runif (n = 5 )
round (v, digits = 1 )
n
is an argument to runif
digits
is an argument to round
function (x, digits = 0, ...)
NULL
Arguments
Arguments can be matched by name or position
round (1 , v) # wrong! But not an error
Don’t name the first argument but name everything else
Packages
R comes with a lot of functions
implement the language for programming
utilities
mathematical
basic & advanced statistical
But it’s not comprehensive
R packages extend R with new functions that implement new statistical methods, utilities, or even entirely new domain specific languages
R packages are user-written and work just like those provided with R
Packages
Packages are typically installed from CRAN
C omprehensive R A rchive N etwork
Packages are installed on to a computer into a library
Install a packages using
install.packages ("pkg_name" )
Load a package each time you want to use it with
(Other repos are available, like GitHub , esp for development versions)