Dyr og Data

A brief introduction to R

Gavin Simpson

Aarhus University

Mona Larsen

Aarhus University

2025-08-29

Learning objectives

At the end of this topic you should be able to

  • Understand the main features of the RStudio IDE

  • Run simple R commands in RStudio

  • Understand the basic syntax of R

  • Understand how to use the R help system

R

R is a powerful software application for statistical analysis

It is incredibly popular

  • It is open source — GPL
  • Vast package ecosystem
  • Designed from the ground up for analysing data
  • Has excellent graphics capabilities

R is an interpreted language unlike C, C++, etc., which are compiled languages

Slower but more forgiving and interactive

RStudio

RStudio is a powerful integrated development environment (IDE) for R

  • an interface for running R
  • an editor for writing R scripts
  • menus & buttons to run common tasks
  • a lot more

It is also open source

RStudio ≠ R

Can run RStudio on your computer or in the cloud using posit.cloud

Posit PBC provide paid-for support & Pro-level versions for organisations

RStudio

Login to Posit.cloud

R example

# Palmer penguins
# Load some packages
library("palmerpenguins")
library("dplyr")
library("ggplot2")

# how many observations of each species of penguin?
penguins |>
    count(species)
# A tibble: 3 × 2
  species       n
  <fct>     <int>
1 Adelie      152
2 Chinstrap    68
3 Gentoo      124

R example

penguins |> 
  group_by(species) |> 
    summarize(across(where(is.numeric), mean, na.rm = TRUE))
# A tibble: 3 × 6
  species   bill_length_mm bill_depth_mm flipper_length_mm body_mass_g  year
  <fct>              <dbl>         <dbl>             <dbl>       <dbl> <dbl>
1 Adelie              38.8          18.3              190.       3701. 2008.
2 Chinstrap           48.8          18.4              196.       3733. 2008.
3 Gentoo              47.5          15.0              217.       5076. 2008.

R example

ggplot(penguins, aes(x = flipper_length_mm,
                     y = body_mass_g,
                     colour = species,
                     shape  = species)) +
  geom_point(size = 3) +
  scale_colour_brewer(palette = "Set1")

R example

Don’t worry! You won’t understand most of that!

By the end of the course you will

Start a new script

R basics

R is a fancy calculator

In the Console, type

12 * 4
[1] 48

You see the result printed directly in the console, as shown above

R is a fancy calculator

Add the same code to your script

12 * 4

To run the code in R, we need to send it to R from our script

Put your cursor anywhere on the line containing the code and press

  • MacOS: Cmd-Enter

  • Windows: Ctrl-Enter

To send the code to R

Assignment

<- is the assignment operator

Made up from the < and - characters

output <- input

Assign the result of the right hand side to the object named on the left

This creates an object with name output

Refer to objects using their name

Assignment

There’s a keyboard shortcut to enter the Assignment operator <- for you

  • MacOS: Option--

  • Windows: Alt--

Data types

The main data types in R are

  • numeric
    • integer
    • double (real values)
    • complex (numbers with real & imaginary parts)
  • character
    • strings of letters, numbers, etc
    • create with matched single ' or double " quotes
  • logical
    • TRUE and FALSE

Never use T and F in their place!

TRUE & FALSE are reserved words in R — can’t be overwritten — but T and F aren’t

T <- FALSE # you monster!
T == TRUE
[1] FALSE

Operators

As well as <- R has many operators

  • Mathematical

    • +
      • -
      • *
      • /
  • Boolean

    • < and >
      • <= and >= (< = & > =)
      • == (= =)
      • != (! =)
      • & AND
      • | OR
      • ! NOT

Getting help

Can get help on R from many places

Inside R use ?topic to get help on topic topic

Usually topic is a function

Can search more broadly with ??topic

Other sources:

Vectors

Vectors

Vectors are the fundamental way that data are stored in R

R doesn’t have scalars — single values — just vectors

Vectors are a one-dimensional collection of values in a single unit

(But see lists later in the course)

Atomic vectors are vectors whose elements are all of the same type

Creating vectors

Create vectors with c() (for combine)

numbers <- c(1, 4, 6, 10)
numbers
[1]  1  4  6 10
people <- c("Alice", "Bob", "Claire", "David")
people
[1] "Alice"  "Bob"    "Claire" "David" 

Number of elements via length()

length(people)
[1] 4

Creating vectors

Many other ways: seq(), rep()

seq(1, 5)
[1] 1 2 3 4 5
seq(1, 10, by = 2)
[1] 1 3 5 7 9
seq(1, 2, length = 5L)
[1] 1.00 1.25 1.50 1.75 2.00
rep(c(1,2), each = 2)
[1] 1 1 2 2
rep(c(1,2), times = 2)
[1] 1 2 1 2

Vectorized operations

Vectors are a power feature of R as they allow us to write more expressive code

v1 <- c(3, 1, 4, 1, 5)
v2 <- c(1, 6, 1, 8, 0)
v1 + v2
[1] 4 7 5 9 5

In other languages, to achieve this you might have to loop (iterate) over the indices of the vectors to add each pair of elements in turn

We’ll talk more about loops and iteration later in the course

Vectorized functions

Most functions in R accept vectors as inputs

v1 <- c(10, 5, 2, 4)
sum(v1)
[1] 21
prod(v1)
[1] 400
length(v1)
[1] 4
round(v1 + runif(length(v1)), 2)
[1] 10.23  5.16  2.16  4.31

Indexing vectors

Having stored data in a vector we might want to access certain elements of the vector

Use [ plus a vector of indices to access elements of a vector

v1
[1] 10  5  2  4

Can also use negative indices to exclude those elements

v1[1]
[1] 10
v1[4]
[1] 4
v1[length(v1)]
[1] 4
v1[2:3]
[1] 5 2
v1[-c(1,3)]
[1] 5 4

Indexing vectors

If we give the elements of the vector names we can index using those

names(v1) <- people
v1
 Alice    Bob Claire  David 
    10      5      2      4 
v1["Alice"]
Alice 
   10 
people[2]
[1] "Bob"
v1[people[2]]
Bob 
  5 

Indexing vectors

We can also use a logical vector to select (TRUE) or exclude (FALSE) elements

v1
 Alice    Bob Claire  David 
    10      5      2      4 
filt <- rep(c(TRUE, FALSE), each = 2)
filt
[1]  TRUE  TRUE FALSE FALSE
v1[filt]
Alice   Bob 
   10     5 
v1[!filt]
Claire  David 
     2      4 

Indexing vectors

Any expression that evaluates to

  • numeric (possibly negative)
  • character (assuming named)
  • logical

can be used to index a vector

v1
 Alice    Bob Claire  David 
    10      5      2      4 

Can also assign new values to elements

v1[4] <- 15
v1
 Alice    Bob Claire  David 
    10      5      2     15 
v1 < 10
 Alice    Bob Claire  David 
 FALSE   TRUE   TRUE  FALSE 
v1[v1 < 10]
   Bob Claire 
     5      2 

Functions

Functions

A function is

  1. a sequence of 1 or more instructions (lines of code)
  2. takes 0 or more arguments
  3. returns something (possibly nothing or NULL, may be invisibly)

seq(), length() etc are all functions

Arguments

Functions typically take arguments — like flags for the CLI commands

v <- runif(n = 5)

round(v, digits = 1)
[1] 0.4 0.6 0.8 0.5 0.3

n is an argument to runif

digits is an argument to round

args(round)
function (x, digits = 0, ...) 
NULL

Arguments

Arguments can be matched by name or position

round(x = v, digits = 1)
[1] 0.4 0.6 0.8 0.5 0.3
round(v, 1)
[1] 0.4 0.6 0.8 0.5 0.3
round(digits = 1, x = v)
[1] 0.4 0.6 0.8 0.5 0.3
round(1, v) # wrong! But not an error
[1] 1 1 1 1 1

Don’t name the first argument but name everything else

Packages

R comes with a lot of functions

  • implement the language for programming
  • utilities
  • mathematical
  • basic & advanced statistical

But it’s not comprehensive

R packages extend R with new functions that implement new statistical methods, utilities, or even entirely new domain specific languages

R packages are user-written and work just like those provided with R

Packages

Packages are typically installed from CRAN

Comprehensive R Archive Network

Packages are installed on to a computer into a library

Install a packages using

install.packages("pkg_name")

Load a package each time you want to use it with

library("pkg_name")

(Other repos are available, like GitHub, esp for development versions)