Abstract

Below, you’ll find several examples of experimental designs or other multivariate data sets that we wish to test using constrained ordination methods and restricted permutation (Monte Carlo) tests.

Ectomycorrhizal fungi

In this example we will study the effect of sod-cutting and sod-addition on the number of sporocarps of ectomycorrhizal fungi in Pine stands differing in age and soil type using data from an experiment reported in Baar & Ter Braak (1996). The experimental layout is as follows. Six stands (St1–St6) with Scots pine of different age and soil type were selected. Within each stand, plots were laid out, with a size of 15 m x 15 m. Treatments applied to stands were:

S — sod cutting, in which the litter and humus layers and the herbaceous vegetation were removed,
A — sod addition, in which removed litter and humus were added onto the existing litter and humus layers, and
C — control, in which the litter and humus layers were left unchanged.

Treatment A (sod addition) was not done in the oldest two stands (St5 and St6) as these were the oldest stands and already had thick layers of humus and litter. The treatments were replicated 4 times each, hence in the younger stands we have 12 plots (4 replicates by 3 treatments), while in the oldest stands we have 8 plots each (4 replicates by 2 treatments). Within stands, treatments were randomized.

library("vegan")

## Loading required package: permute

## Loading required package: lattice

## This is vegan 2.6-7

library("permute")
library("readr")
library("tidyr")
library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library("tibble")
library("forcats")
library("stringr")

fungi <- read_csv(url("https://bit.ly/fungi-spp"), skip = 1L,
    col_types = "-cddddddddddddddddddddddddddddddddddd") %>%
    mutate(across(Amangemm:Xerosubt, ~ replace_na(.x, 0))) %>%
        select(-(Ncarp:Nspec)) %>%
        rename(label = "...2")

## New names:
## • `` -> `...1`
## • `` -> `...2`

design <- read_csv(url("https://bit.ly/fungi-design"), skip = 1L,
    col_types = "-cccdc") %>%
    rename(label = "...2", Treatment = `LH-treat`) %>%
        mutate(Treatment = fct_relevel(Treatment, "C", after = 0L),
            Treatment = fct_recode(Treatment, control = "C",
                sod_cutting = "S", sod_addition = "A"),
            Stand = factor(Stand),
            SoilType = factor(SoilType))

## New names:
## • `` -> `...1`
## • `` -> `...2`

Testing the effects of whole-plot variables

Testing the effects of stand age and soil type

Using restricted permutations for split-plot designs, we can test the effects of variables that vary between whole-plots (but which are constant at the split-lot level). However, this is only possible if we make the design balanced by

deleting all plots from stands St5 and St6, or
deleting all the sod_addition plots.

For this particular hypothesis we are testing the effect of a variable that variables at the whole-plot level (the stand level), it is better to keep as many stands as possible, so the sod_addition plots will be deleted.

fungi_bal <- fungi %>%
    left_join(design) %>%
    filter(Treatment != "sod_addition") %>%
    select(label:"Xerosubt") %>%
    mutate(across(-label, ~ log1p(.x))) %>%
    column_to_rownames("label")

## Joining with `by = join_by(label)`

design_bal <- design %>%
    filter(Treatment != "sod_addition") %>%
    column_to_rownames("label")

Fit the RDA using the two variables that vary at the stand level.
To test variables at this whole-plot level we need to permute the whole plot while leaving the split-plots (samples within the whole-plots) unpermuted. Created the necessary design using how() to test the effects of the whole plot variables that only vary at the stand level (whole-plots).

Testing the effects of stand age adjusting for soil type

We can test the effect of Age while controlling (or adjusting) for the effect of SoilType by partialling out the effect of SoilType using Condition().

Fit an RDA to estimate the effect of Age while adjusting for the effect of SoilType
Use an appropriate restricted permutation design to test the effects of Age adjusted for SoilType

Testing the effect of split-plot variables

The treatment varies at the split-plot (within) level. We now want to test the effects of the treatment on the abundance of the fungi. This is best done by randomising samples within blocks (stands), while keeping the blocks fixed. To do this correctly, we also need to include the block variable as a covariate to be partialled out using Condition().

We can use all the data for this test, so we prepare the full data set for analysis

fungi_all <- fungi %>%
    mutate(across(-label, ~ log1p(.x))) %>%
    column_to_rownames("label")

design_all <- design %>%
    column_to_rownames("label")

Create an RDA to estimate the effect of the treatment while adjusting for the dependence structure in the data.
Design an appropriate restricted permutation design to test the estimated treatment effect.

Additionally, we might be interested in testing the effect of the treatment differs among stands of different age, adjusting for a possible treatment by soil type interaction.

Fit an RDA to the data to test for different treatment effects among stands of different ages
Using an appropriate permutation design, test the estimated treatment effects among stands of different ages

Testing a spatial design

In this example you’ll look at the effect of pH on plant species in a wetland. The samples were laid out in grid of 20 samples arranged in five rows and four columns over the wetland.

The response data are abundances of 100 plant species at each of the 20 plots. We also have some associated data to go with the species abundances. The plots were sampled in 1977 and 1988, and pH and water depth were recorded at each location in the grid for each year.

For this example we’ll look only at the effect of pH on wetland plant abundance in 1988, so we will remove the older samples from the data before proceeding. We also need to ensure that the data are arranged in the data set in grid-column order. The code below loads the data, filters to retain the 1988 samples only, and then orders the data in grid-column order.

Plots in a rectangular spatial layout

grid_veg <- read_csv(url("https://bit.ly/grid-veg"),
    skip = 1L) %>%
    mutate(across(ACHILMIL:`POPUL-SP`, ~ replace_na(.x, 0))) %>%
        select(-("...1")) %>%
        rename(label = "...2")

## New names:
## Rows: 40 Columns: 102
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (2): ...1, ...2 dbl (100): ACHILMIL, ACHILPTA, AGROS-SP, ANGELSYL, ANTHOODO, CALAMCAN,
## CAPSE...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## • `` -> `...2`

grid <- read_csv(url("https://bit.ly/grid-predictors"), skip = 1L,
    col_types = "-cccdddd") %>%
    rename(label = "...2") %>%
    mutate(Column = factor(str_sub(Plot, 2L, 2L)),
           Plot = factor(Plot),
           Row = factor(Row)) %>%
    relocate(Column, .after = Row) %>%
    arrange(Year, Column, Row)

## New names:
## • `` -> `...1`
## • `` -> `...2`

# grid 1988 data
g88_veg <- grid %>%
    left_join(grid_veg) %>%
    filter(Year == 1988) %>%
    select(c(label, ACHILMIL:`POPUL-SP`)) %>%
    column_to_rownames("label")

## Joining with `by = join_by(label)`

g88 <- grid %>%
    filter(Year == 1988) %>%
    column_to_rownames("label")

This leaves us with 20 samples arranged in a grid.

We want to look at how the species abundances vary with pH, while controlling for potential spatial autocorrelation across the wetland. Fit an appropriate constrained ordination to estimate the effect of pH on species abundances, and test the estimated pH effect using an appropriate restricted permutation test.

Ohraz example

The code below loads the Ohraz data and the experimental design data using slightly more modern code than was used in the slides.

Repeated observations of composition from an experiment

Factorial design (3 replicates)
Treatments: fertilisation, mowing, Molinia removal

Test 1 of the hypotheses

There are no directional changes in species composition in time that are common to all treatments or specific treatments

spp <- read_csv(url("https://bit.ly/ohraz-spp")) %>%
    rename(label = "...1") %>%
    janitor::clean_names()

## New names:
## Rows: 96 Columns: 87
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (86): molicaer, nardstri, festrubr, brizmed, anthodor, siegdecu,
## holclan...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

molinia <- spp %>%
    select(label:molicaer)

spp <- spp %>%
    select(-molicaer) %>%
    column_to_rownames("label")

env <- read_csv(url("https://bit.ly/ohraz-env")) %>%
    rename(label = "...1") %>%
    mutate(across(c(mowing:removal, plotid), ~ factor(.x))) %>%
    column_to_rownames("label")

## New names:
## Rows: 96 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (5): ...1, mowing, fertilizer, removal, plotid dbl (1): year
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

Hierarchical effects of crayfish

The code below loads the crayfish data and the scale predictor variables (design) using slightly more modern code than was used in the slides.

crayfish <- read_csv(url("https://bit.ly/crayfish-spp")) %>%
    rename(label = "...1") %>%
    janitor::clean_names() %>%
    filter(!is.na(label)) %>%
    column_to_rownames("label")

## New names:
## Rows: 568 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (10): punct-L, punct-S, march-L, march-S, hubbs-L, hubbs-S, oz-L, oz-S,
## ...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

design <- read_csv(url("https://bit.ly/crayfish-design")) %>%
    rename(label = "...1") %>%
    filter(!is.na(label)) %>%
    mutate(across(Watershed:`Run Nested`, ~ factor(.x))) %>%
    column_to_rownames("label")

## New names:
## Rows: 568 Columns: 8
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (8): ...1, Watershed, Stream, Reach, Run, Stream Nested, Reach Nested, R...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`

Repeat the exercise from the slides, testing the different levels of the hierarchical scales of variation in crayfish species abundance.

Effect of ploughing time on weeds

Post (1986) carried out an experiment to investigate the effect of time of ploughing on the subsequent weed composition in summer barley fields. Three different ploughin times were used. There are 13 weed species observed in 12 plots in a randomized block experiment of four complete blocks of three plots each.

Traditionally, a multivariate ANOVA could have been used to analyses data like this. However, as there are more species (response variables; 13) than there are samples (12), MANOVA cannot be used. Instead, RDA is a useful option.

plough <- read_csv(url("https://bit.ly/plough-spp"), skip = 1) %>%
    rename(label = "...1") %>%
    select(-`...2`) %>%
    janitor::clean_names() %>%
    filter(!is.na(label)) %>%
    column_to_rownames("label")

## New names:
## Rows: 12 Columns: 15
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (2): ...1, ...2 dbl (13): Che alb, Pol per, Pol avi, Spe arv, Mat rec, Gna uli, Cap
## bur, Ste...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## • `` -> `...2`

plough_design <- read_csv(url("https://bit.ly/plough-design")) %>%
    rename(label = "...1") %>%
    select(-`...2`) %>%
    janitor::clean_names() %>%
    filter(!is.na(label)) %>%
    column_to_rownames("label")

## New names:
## Rows: 13 Columns: 4
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (4): ...1, ...2, Block, Ploughing Time
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## • `` -> `...2`

We are interested in the effect of ploughing time and wish to remove any possible effect of the blocks. We achieve this by using block as a covariate (in Condition()). As the data are counts, a log-transformation is suggested (because we can’t do a multivariate Poisson GLM to analyses these data.) Some zero abundances are recorded as NA values, so we must replace those with 0s before we proceed.

plough <- plough %>%
    mutate(across(che_alb:ape_spi, .fns = tidyr::replace_na, replace = 0))

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(che_alb:ape_spi, .fns = tidyr::replace_na, replace = 0)`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))

log_plough <- log1p(plough)

A multivariate experiment with fixed factors

This example uses a data set of Swedish pine forest undergrowth assemblages that were experimentally manipulated through the addition of nitrogen N and phosphorus P. There are 32 samples in 4 complete, randomized bloacks of 8 plots each. In each block, all combinations of 4 levels of nitrogen N (applied as ammonium nitrate) and 2 levels of phosphorus P (applied as compound PK fertilizer) were applied.

e40_spp <- read_csv(url("https://bit.ly/e40-spp"), skip = 1L) %>%
    rename(label = "...1") %>%
    select(-`...2`) %>%
    mutate(across(-label, ~ replace_na(.x, 0)),
           across(-label, ~ log1p(.x))) %>%
    column_to_rownames("label")

## New names:
## Rows: 32 Columns: 105
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," dbl
## (105): ...1, ...2, agroscap, anthoodo, arctouva, aulacpal, betulpen, bet...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## • `` -> `...2`

e40_design <- read_csv(url("https://bit.ly/e40-design"), skip = 1L) %>%
    rename(label = "...1") %>%
    select(-`...2`) %>%
    mutate(across(-label, ~ factor(.x))) %>%
    column_to_rownames("label")

## New names:
## Rows: 32 Columns: 5
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (3): N, P, Block dbl (2): ...1, ...2
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## • `` -> `...2`

The levels of N are

with(e40_design, levels(N))

## [1] "N0" "N1" "N2" "N3"

where N0 means no N addition

The levels of P are

with(e40_design, levels(P))

## [1] "P0" "P1"

where P0 means no P addition.

The variable Block codes for the 4 blocks of 8 plots.

In 1987 the undergrowth of the plots was surveyed. The percentage cover of plant species (including tree saplings, bryophytes, and lichens) was estimated and recorded on a ten-point scale. There are 103 species recorded. The ten-point scale values were re-expressed as the mid-points of the ten-point percentage scale in e40_spp. The species data are log(x + 1) transformed.

The aim of the analysis is to test the effects of

the N addition effect,
the P addition effect,
the combined effects of N and P additions
the magnitude of the interaction effect

using a combination of models and appropriate permutation tests.

Permutation tests

Gavin Simpson

May 30, 2024