The classic Dune Meadow data

dune is a data frame of observations of 30 species at 20 sites. The species names are abbreviated to 4+4 letters (see ?make.cepnames). The following names are changed from the original source (Jongman et al. 1987): Leontodon autumnalis to Scorzoneroides, and Potentilla palustris to Comarum.

dune.env is a data frame of 20 observations on the following 5 variables:

A1 a numeric vector of thickness of soil A1 horizon,
Moisture an ordered factor with levels: 1 < 2 < 4 < 5,
Management a factor with levels: BF (Biological farming), HF (Hobby farming), NM (Nature Conservation Management), and SF (Standard Farming).

Use an ordered factor of land-use with levels: Hayfield < Haypastu < Pasture.

Manure an ordered factor with levels: 0 < 1 < 2 < 3 < 4.

The data are provided with vegan, so we can load them using

library("vegan")

## Loading required package: permute

data(dune, dune.env, package = "vegan")

In this exercise we’ll generate an ordination of the dune meadow data and then plot the ordination using various tools to interpret the latent species-environment gradients we have identified in the data.

We will begin by fitting a PCA on the Hellinger transformed species data

ord <- pca(decostand(dune, method = "hellinger"))
ord

## 
## Call: pca(X = decostand(dune, method = "hellinger"))
## 
##               Inertia Rank
## Total          0.5559     
## Unconstrained  0.5559   19
## 
## Inertia is variance
## 
## Eigenvalues for unconstrained axes:
##     PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8 
## 0.16992 0.11106 0.05634 0.04397 0.04097 0.03019 0.02104 0.01794 
## (Showing 8 of 19 unconstrained eigenvalues)

Now we create plots of the ordination scores

layout(matrix(1:2, ncol = 2))
ordipointlabel(ord, display = "species", scaling = "symmetric")
ordipointlabel(ord, display = "sites", scaling = "symmetric")

layout(1)

If we wanted to plot both species and sites on the same plot, it can help to not label all the species, instead only labelling the ones whose abundances are well-fitted by the ordination axes being displayed. We can use the goodness() function to compute goodness of fit values for the species

gf <- goodness(ord, model = "CA", display = "species", choices = 1:2)
head(gf)

##                 PC1       PC2
## Achimill 0.49999800 0.5323891
## Agrostol 0.76283527 0.9069776
## Airaprae 0.01928652 0.3943117
## Alopgeni 0.10332801 0.6483475
## Anthodor 0.24928477 0.6175504
## Bellpere 0.18295468 0.2239834

Note that the values displayed are cummulative, so the values in column PC2 are the cummulative goodness of fit values over the two axes. We can, for example, identify those species whose fitted abundance on the two axes is 40% or greater.

take <- gf[,2] >= 0.4

This results in sum(take) species to be labelled.

We proceed by building up the plot from the basic building blocks

scl <- "symmetric"
plot(ord, type = "n", scaling = scl, display = c("species", "sites"))

next we’ll add points for all the species, and sites, unlabelled

points(ord, scaling = scl, display = "species", cex = 0.8, col = "red")
points(ord, scaling = scl, display = "sites", cex = 0.8, col = "black")

Next we will add the species labels and the sites/samples

ordipointlabel(ord, display = "sites", scaling = scl, add = TRUE)
ordipointlabel(ord, display = "species", scaling = scl, add = TRUE, select = take)

What do you notice about the positions on the plot of the species that are not as well fitted by the two displayed axes?

A final enhancement we might make to this is to colour the site/sample points according to one of the environmental variables in dune.env. One of the key variables is the management type for each of the dune meadow samples, so we will colour the sample points according to this variable, using the indexing trick I mentioned in the slides.

Begin by choosing a vector of colours. Here I’ll use a palette from ColorBrewer, and specify the colours using hexadecimal notation

col_vec <- c("#1b9e77", "#d95f02", "#7570b3", "#e7298a")

Next we need to expand this vector of colours into one per sample, depending upon the management type

cols <- with(dune.env, col_vec[Management])

Now we are ready to replot, reusing code from earlier, which I have gathered into a single code block

## goodness of fit
gf <- goodness(ord, model = "CA", display = "species", choices = 1:2)
take <- gf[,2] >= 0.4 # which species to label
scl <- "symmetric"    # what scaling to use

## prepare the plotting device
plot(ord, type = "n", scaling = scl, display = c("species", "sites"))

## add points for the species
points(ord, scaling = scl, display = "species", cex = 0.8, col = "red", pch = "+")

## add points for the samples, colouring by management type
points(ord, scaling = scl, display = "sites", pch = 19, col = cols)

## next, label the samples and species points
ordipointlabel(ord, display = "sites", scaling = scl, add = TRUE)
ordipointlabel(ord, display = "species", scaling = scl, add = TRUE, select = take)

## finally add a legend
lvl <- with(dune.env, levels(Management))
legend("topright", legend = lvl, bty = "n", col = col_vec, pch = 19)

Using ggvegan

We can do something similar with ggvegan, although the tools are not as well developed as the vegan tools for base graphics are/

The convention with ggplot2 is to fortify() the object we wish to plot, which basically means turn the object into a format that is suitable for plotting with ggplot(). ggvegan provides fortify() methods for many of the objects tht vegan creates, but not all are currently supported.

We begin by fortifying the PCA ordination object we created earlier ord

library("ggvegan")
library("ggrepel")

## Loading required package: ggplot2

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

# fortify the ordination
ford <- fortify(ord, axes = 1:2, scaling = "symmetric")
## not yet a tibble
head(ford, 4)

## # A tibble: 4 × 4
##   score   label        pc1     pc2
##   <fct>   <chr>      <dbl>   <dbl>
## 1 species Achimill  0.301  -0.0853
## 2 species Agrostol -0.557   0.269 
## 3 species Airaprae  0.0472 -0.231 
## 4 species Alopgeni -0.186   0.476

Now we can produce a version of the plot we made earlier, but now using ggplot()

ggplot(ford, aes(x = pc1, y = pc2)) +
  geom_hline(yintercept = 0, colour = "steelblue", alpha = 0.5) +
  geom_vline(xintercept = 0, colour = "steelblue", alpha = 0.5) +
  geom_point(data = dplyr::filter(ford, score == "sites")) +
  geom_text_repel(data = dplyr::filter(ford, score == "sites"),
                  mapping = aes(label = label)) +
  geom_point(data = dplyr::filter(ford, score == "species"), colour = "red", pch = 3, size = 2) +
  geom_text_repel(data = 
                    dplyr::filter(ford, score == "species") |>
                    dplyr::filter(take),
                  mapping = aes(label = label)) +
  coord_fixed()

If we want to use ggplot() to reproduce the plot where we coloured the sample points by the farm management type, we need to do a little data wrangling to combine the ordination results with the environmental data.

# fortify the different scores into separate data frames
site_scrs <- fortify(ord, axes = 1:2, scaling = "symmetric", layers = "sites")
spp_scrs <- fortify(ord, axes = 1:2, scaling = "symmetric", layers = "species")

We need to link the site scores with the environmental data. If the order of the data is the same in the ordination (species) and the environmental data, as it is here, we can just use bind_cols() to link the two data objects.

site_scrs <- site_scrs |>
  bind_cols(dune.env)

Now we can rerun the code above, but mapping the Management variable to the colour channel for the layer plotting the sample points

ggplot(ford, aes(x = pc1, y = pc2)) +
  geom_hline(yintercept = 0, colour = "steelblue", alpha = 0.5) +
  geom_vline(xintercept = 0, colour = "steelblue", alpha = 0.5) +
  geom_point(data = site_scrs,
             mapping = aes(colour = Management)) + # +<<
  geom_text_repel(data = site_scrs,
                  mapping = aes(label = label)) +
  geom_point(data = spp_scrs, colour = "red", pch = 3, size = 1) +
  geom_text_repel(data = filter(spp_scrs, take),
                  mapping = aes(label = label)) +
  coord_fixed() +
  theme(legend.position = "bottom") # move the legend

If the data are in different orders, we will need the sample labels within both data objects and join the two data objects using those labels. This operation is known as a left join and can be done using dplyr’s left_join() function, after

# note we scramble the `dune.env` data to illustrate the point
site_scrs <- fortify(ord, axes = 1:2, scaling = "symmetric",
                     layers = "sites") |>
  left_join(tibble::rownames_to_column(dune.env[sample(nrow(dune.env)), ]),
            by = c("label" = "rowname"))

and then we could proceed as above with the plot.

Dutch Dune Meadows

The classic Dune Meadow data

Using ggvegan