Below, you’ll find several examples of experimental designs or other multivariate data sets that we wish to test using constrained ordination methods and restricted permutation (Monte Carlo) tests.
In this example we will study the effect of sod-cutting and sod-addition on the number of sporocarps of ectomycorrhizal fungi in Pine stands differing in age and soil type using data from an experiment reported in Baar & Ter Braak (1996). The experimental layout is as follows. Six stands (St1–St6) with Scots pine of different age and soil type were selected. Within each stand, plots were laid out, with a size of 15 m x 15 m. Treatments applied to stands were:
S
— sod cutting, in which the litter and humus
layers and the herbaceous vegetation were removed,
A
— sod addition, in which removed litter and humus
were added onto the existing litter and humus layers, and
C
— control, in which the litter and humus layers
were left unchanged.
Treatment A
(sod addition) was not done in the oldest
two stands (St5
and St6
) as these were the
oldest stands and already had thick layers of humus and litter. The
treatments were replicated 4 times each, hence in the younger stands we
have 12 plots (4 replicates by 3 treatments), while in the oldest stands
we have 8 plots each (4 replicates by 2 treatments). Within stands,
treatments were randomized.
library("vegan")
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.6-7
library("permute")
library("readr")
library("tidyr")
library("dplyr")
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library("tibble")
library("forcats")
library("stringr")
fungi <- read_csv(url("https://bit.ly/fungi-spp"), skip = 1L,
col_types = "-cddddddddddddddddddddddddddddddddddd") %>%
mutate(across(Amangemm:Xerosubt, ~ replace_na(.x, 0))) %>%
select(-(Ncarp:Nspec)) %>%
rename(label = "...2")
## New names:
## • `` -> `...1`
## • `` -> `...2`
design <- read_csv(url("https://bit.ly/fungi-design"), skip = 1L,
col_types = "-cccdc") %>%
rename(label = "...2", Treatment = `LH-treat`) %>%
mutate(Treatment = fct_relevel(Treatment, "C", after = 0L),
Treatment = fct_recode(Treatment, control = "C",
sod_cutting = "S", sod_addition = "A"),
Stand = factor(Stand),
SoilType = factor(SoilType))
## New names:
## • `` -> `...1`
## • `` -> `...2`
Using restricted permutations for split-plot designs, we can test the effects of variables that vary between whole-plots (but which are constant at the split-lot level). However, this is only possible if we make the design balanced by
St5
and
St6
, orsod_addition
plots.For this particular hypothesis we are testing the effect of a
variable that variables at the whole-plot level (the stand level), it is
better to keep as many stands as possible, so the
sod_addition
plots will be deleted.
fungi_bal <- fungi %>%
left_join(design) %>%
filter(Treatment != "sod_addition") %>%
select(label:"Xerosubt") %>%
mutate(across(-label, ~ log1p(.x))) %>%
column_to_rownames("label")
## Joining with `by = join_by(label)`
design_bal <- design %>%
filter(Treatment != "sod_addition") %>%
column_to_rownames("label")
Fit the RDA using the two variables that vary at the stand level.
To test variables at this whole-plot level we need to permute the
whole plot while leaving the split-plots (samples within the
whole-plots) unpermuted. Created the necessary design using
how()
to test the effects of the whole plot variables that
only vary at the stand level (whole-plots).
We can test the effect of Age
while controlling (or
adjusting) for the effect of SoilType
by partialling out
the effect of SoilType
using Condition()
.
Fit an RDA to estimate the effect of Age
while
adjusting for the effect of SoilType
Use an appropriate restricted permutation design to test the
effects of Age
adjusted for SoilType
The treatment varies at the split-plot (within) level. We now want to
test the effects of the treatment on the abundance of the fungi. This is
best done by randomising samples within blocks (stands), while keeping
the blocks fixed. To do this correctly, we also need to include the
block variable as a covariate to be partialled out using
Condition()
.
We can use all the data for this test, so we prepare the full data set for analysis
fungi_all <- fungi %>%
mutate(across(-label, ~ log1p(.x))) %>%
column_to_rownames("label")
design_all <- design %>%
column_to_rownames("label")
Create an RDA to estimate the effect of the treatment while adjusting for the dependence structure in the data.
Design an appropriate restricted permutation design to test the estimated treatment effect.
Additionally, we might be interested in testing the effect of the treatment differs among stands of different age, adjusting for a possible treatment by soil type interaction.
Fit an RDA to the data to test for different treatment effects among stands of different ages
Using an appropriate permutation design, test the estimated treatment effects among stands of different ages
In this example you’ll look at the effect of pH on plant species in a wetland. The samples were laid out in grid of 20 samples arranged in five rows and four columns over the wetland.
The response data are abundances of 100 plant species at each of the 20 plots. We also have some associated data to go with the species abundances. The plots were sampled in 1977 and 1988, and pH and water depth were recorded at each location in the grid for each year.
For this example we’ll look only at the effect of pH on wetland plant abundance in 1988, so we will remove the older samples from the data before proceeding. We also need to ensure that the data are arranged in the data set in grid-column order. The code below loads the data, filters to retain the 1988 samples only, and then orders the data in grid-column order.
grid_veg <- read_csv(url("https://bit.ly/grid-veg"),
skip = 1L) %>%
mutate(across(ACHILMIL:`POPUL-SP`, ~ replace_na(.x, 0))) %>%
select(-("...1")) %>%
rename(label = "...2")
## New names:
## Rows: 40 Columns: 102
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (2): ...1, ...2 dbl (100): ACHILMIL, ACHILPTA, AGROS-SP, ANGELSYL, ANTHOODO, CALAMCAN,
## CAPSE...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## • `` -> `...2`
grid <- read_csv(url("https://bit.ly/grid-predictors"), skip = 1L,
col_types = "-cccdddd") %>%
rename(label = "...2") %>%
mutate(Column = factor(str_sub(Plot, 2L, 2L)),
Plot = factor(Plot),
Row = factor(Row)) %>%
relocate(Column, .after = Row) %>%
arrange(Year, Column, Row)
## New names:
## • `` -> `...1`
## • `` -> `...2`
# grid 1988 data
g88_veg <- grid %>%
left_join(grid_veg) %>%
filter(Year == 1988) %>%
select(c(label, ACHILMIL:`POPUL-SP`)) %>%
column_to_rownames("label")
## Joining with `by = join_by(label)`
g88 <- grid %>%
filter(Year == 1988) %>%
column_to_rownames("label")
This leaves us with 20 samples arranged in a grid.
We want to look at how the species abundances vary with
pH
, while controlling for potential spatial autocorrelation
across the wetland. Fit an appropriate constrained ordination to
estimate the effect of pH
on species abundances, and test
the estimated pH
effect using an appropriate restricted
permutation test.
The code below loads the Ohraz data and the experimental design data using slightly more modern code than was used in the slides.
Repeated observations of composition from an experiment
Test 1 of the hypotheses
There are no directional changes in species composition in time that are common to all treatments or specific treatments
spp <- read_csv(url("https://bit.ly/ohraz-spp")) %>%
rename(label = "...1") %>%
janitor::clean_names()
## New names:
## Rows: 96 Columns: 87
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (86): molicaer, nardstri, festrubr, brizmed, anthodor, siegdecu,
## holclan...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
molinia <- spp %>%
select(label:molicaer)
spp <- spp %>%
select(-molicaer) %>%
column_to_rownames("label")
env <- read_csv(url("https://bit.ly/ohraz-env")) %>%
rename(label = "...1") %>%
mutate(across(c(mowing:removal, plotid), ~ factor(.x))) %>%
column_to_rownames("label")
## New names:
## Rows: 96 Columns: 6
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (5): ...1, mowing, fertilizer, removal, plotid dbl (1): year
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
The code below loads the crayfish
data and the scale
predictor variables (design
) using slightly more modern
code than was used in the slides.
crayfish <- read_csv(url("https://bit.ly/crayfish-spp")) %>%
rename(label = "...1") %>%
janitor::clean_names() %>%
filter(!is.na(label)) %>%
column_to_rownames("label")
## New names:
## Rows: 568 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (1): ...1 dbl (10): punct-L, punct-S, march-L, march-S, hubbs-L, hubbs-S, oz-L, oz-S,
## ...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
design <- read_csv(url("https://bit.ly/crayfish-design")) %>%
rename(label = "...1") %>%
filter(!is.na(label)) %>%
mutate(across(Watershed:`Run Nested`, ~ factor(.x))) %>%
column_to_rownames("label")
## New names:
## Rows: 568 Columns: 8
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (8): ...1, Watershed, Stream, Reach, Run, Stream Nested, Reach Nested, R...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
Repeat the exercise from the slides, testing the different levels of the hierarchical scales of variation in crayfish species abundance.
Post (1986) carried out an experiment to investigate the effect of time of ploughing on the subsequent weed composition in summer barley fields. Three different ploughin times were used. There are 13 weed species observed in 12 plots in a randomized block experiment of four complete blocks of three plots each.
Traditionally, a multivariate ANOVA could have been used to analyses data like this. However, as there are more species (response variables; 13) than there are samples (12), MANOVA cannot be used. Instead, RDA is a useful option.
plough <- read_csv(url("https://bit.ly/plough-spp"), skip = 1) %>%
rename(label = "...1") %>%
select(-`...2`) %>%
janitor::clean_names() %>%
filter(!is.na(label)) %>%
column_to_rownames("label")
## New names:
## Rows: 12 Columns: 15
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (2): ...1, ...2 dbl (13): Che alb, Pol per, Pol avi, Spe arv, Mat rec, Gna uli, Cap
## bur, Ste...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## • `` -> `...2`
plough_design <- read_csv(url("https://bit.ly/plough-design")) %>%
rename(label = "...1") %>%
select(-`...2`) %>%
janitor::clean_names() %>%
filter(!is.na(label)) %>%
column_to_rownames("label")
## New names:
## Rows: 13 Columns: 4
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (4): ...1, ...2, Block, Ploughing Time
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## • `` -> `...2`
We are interested in the effect of ploughing time and wish to remove
any possible effect of the blocks. We achieve this by using
block
as a covariate (in Condition()
). As the
data are counts, a log-transformation is suggested (because we can’t do
a multivariate Poisson GLM to analyses these data.) Some zero abundances
are recorded as NA
values, so we must replace those with 0s
before we proceed.
plough <- plough %>%
mutate(across(che_alb:ape_spi, .fns = tidyr::replace_na, replace = 0))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(che_alb:ape_spi, .fns = tidyr::replace_na, replace = 0)`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
##
## # Previously
## across(a:b, mean, na.rm = TRUE)
##
## # Now
## across(a:b, \(x) mean(x, na.rm = TRUE))
log_plough <- log1p(plough)
This example uses a data set of Swedish pine forest undergrowth
assemblages that were experimentally manipulated through the addition of
nitrogen N
and phosphorus P
. There are 32
samples in 4 complete, randomized bloacks of 8 plots each. In each
block, all combinations of 4 levels of nitrogen N
(applied
as ammonium nitrate) and 2 levels of phosphorus P
(applied
as compound PK fertilizer) were applied.
e40_spp <- read_csv(url("https://bit.ly/e40-spp"), skip = 1L) %>%
rename(label = "...1") %>%
select(-`...2`) %>%
mutate(across(-label, ~ replace_na(.x, 0)),
across(-label, ~ log1p(.x))) %>%
column_to_rownames("label")
## New names:
## Rows: 32 Columns: 105
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," dbl
## (105): ...1, ...2, agroscap, anthoodo, arctouva, aulacpal, betulpen, bet...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## • `` -> `...2`
e40_design <- read_csv(url("https://bit.ly/e40-design"), skip = 1L) %>%
rename(label = "...1") %>%
select(-`...2`) %>%
mutate(across(-label, ~ factor(.x))) %>%
column_to_rownames("label")
## New names:
## Rows: 32 Columns: 5
## ── Column specification
## ──────────────────────────────────────────────────────────────── Delimiter: "," chr
## (3): N, P, Block dbl (2): ...1, ...2
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ Specify the
## column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
## • `` -> `...2`
The levels of N
are
with(e40_design, levels(N))
## [1] "N0" "N1" "N2" "N3"
where N0
means no N addition
The levels of P
are
with(e40_design, levels(P))
## [1] "P0" "P1"
where P0
means no P addition.
The variable Block
codes for the 4 blocks of 8
plots.
In 1987 the undergrowth of the plots was surveyed. The percentage
cover of plant species (including tree saplings, bryophytes, and
lichens) was estimated and recorded on a ten-point scale. There are 103
species recorded. The ten-point scale values were re-expressed as the
mid-points of the ten-point percentage scale in e40_spp
.
The species data are log(x + 1)
transformed.
The aim of the analysis is to test the effects of
using a combination of models and appropriate permutation tests.