Analysing a randomised complete block design with vegan

library("vegan")
library("readxl")
library("dplyr")

We’ll need readxl, and its read_xlsx() function, to read from the XLS format files that the example data come as.

The data set itself is quite simple and small, consisting of counts on 23 species from 16 plots, and arise from a randomised complete block designed experiment described by Špačková and colleagues [-@Spackova1998-ad] and analysed by [@Smilauer2014-ac] in their recent book using Canoco v5.

The experiment tested the effects on seedling recruitment to a range of treatments

control
removal of litter
removal of the dominant species Nardus stricta
removal of litter and moss (moss couldn’t be removed without also removing litter)

The treatments were replicated replicated in four, randomised complete blocks.

The data are available from the accompanying website to the book Multivariate Analysis of Ecological Data using CANOCO 5 [@Smilauer2014-ac]. They are supplied as XLS format files in a ZIP archive. We can read these into R directly from the web with a little bit of effort

## Download the excel data
furl <- "https://bit.ly/seedlings-example"
td <- tempdir()
tf <- tempfile(tmpdir = td, fileext = ".xlsx")
download.file(furl, tf)

## list the sheets in the workbook
excel_sheets(tf)

## [1] "Info"        "seedlspe"    "seedldesign"

## read the xlsx file, sheet 2 contains species data, sheet 3 the env
spp <- read_xlsx(tf, sheet = "seedlspe", skip = 1) %>%
  rename("sample_id" = "...1") %>%
  tibble::column_to_rownames("sample_id")
env <- read_xlsx(tf, sheet = "seedldesign")%>%
  rename("sample_id" = "...1") %>%
  tibble::column_to_rownames("sample_id")

If the above code block doesn’t work for you (you get an error), then make sure you are in the 04-thursday directory (folder) and have set that directory was the working directory for your R session, and then run

spp <- read_xlsx("data/seedlings.xlsx", sheet = "seedlspe", skip = 1) %>%
  rename("sample_id" = "...1") %>%
  tibble::column_to_rownames("sample_id")

env <- read_xlsx("data/seedlings.xlsx", sheet = "seedldesign")%>%
  rename("sample_id" = "...1") %>%
  tibble::column_to_rownames("sample_id")

The block variable is currently coded as an integer and needs converting to a factor if we are to use it correctly in the analysis

env <- mutate(env, block = factor(block))

The gradient lengths are short,

decorana(spp)

## 
## Call:
## decorana(veg = spp) 
## 
## Detrended correspondence analysis with 26 segments.
## Rescaling of axes with 4 iterations.
## Total inertia (scaled Chi-square): 1.1534 
## 
##                        DCA1   DCA2    DCA3    DCA4
## Eigenvalues          0.1759 0.1898 0.11004 0.05761
## Additive Eigenvalues 0.1759 0.1898 0.11032 0.05318
## Decorana values      0.2710 0.1822 0.07219 0.02822
## Axis lengths         1.9821 1.4140 1.15480 0.87680

motivating the use of redundancy analysis (RDA). Additionally, we may be interested in how the raw abundance of seedlings change following experimental manipulation, so we may wish to focus on the proportional differences between treatments. The first case is handled naturally by RDA. The second case will require some form of standardisation by samples, say by sample totals.

First, let’s test the first null hypothesis; that there is no effect of the treatment on seedling recruitment. This is a simple RDA. We should take into account the block factor when we assess this model for significance. How we do this illustrates two potential approaches to performing permutation tests

design-based permutations, where how the samples are permuted follows the experimental design, or
model-based permutations, where the experimental design is included in the analysis directly and residuals are permuted by simple randomisation.

There is an important difference between the two approach, one which I’ll touch on shortly.

We’ll proceed by fitting the model, conditioning on block to remove between block differences

mod1 <- rda(spp ~ treatment + Condition(block), data = env)
mod1

## Call: rda(formula = spp ~ treatment + Condition(block), data = env)
## 
##                Inertia Proportion Rank
## Total         990.7875     1.0000     
## Conditional   166.0708     0.1676    3
## Constrained   329.8375     0.3329    3
## Unconstrained 494.8792     0.4995    9
## Inertia is variance 
## 
## Eigenvalues for constrained axes:
##   RDA1   RDA2   RDA3 
## 284.81  30.83  14.20 
## 
## Eigenvalues for unconstrained axes:
##    PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8    PC9 
## 226.83 139.51  72.77  30.11   9.81   9.14   2.80   2.19   1.73

There is a strong single, linear gradient in the data as evidenced by the relative magnitudes of the eigenvalues (here expressed as proportions of the total variance)

summary(eigenvals(mod1))

## Importance of components:
##                           RDA1     RDA2     RDA3     PC1      PC2      PC3
## Eigenvalue            284.8141 30.82540 14.19796 226.827 139.5119 72.76789
## Proportion Explained    0.3453  0.03738  0.01722   0.275   0.1692  0.08823
## Cumulative Proportion   0.3453  0.38272  0.39994   0.675   0.8441  0.93237
##                            PC4     PC5     PC6      PC7      PC8      PC9
## Eigenvalue            30.10820 9.80812 9.13690 2.797943 2.190944 1.730598
## Proportion Explained   0.03651 0.01189 0.01108 0.003393 0.002657 0.002098
## Cumulative Proportion  0.96888 0.98077 0.99185 0.995245 0.997902 1.000000

Design-based permutations

A design-based permutation test of these data would be one conditioned on the block variable, by restricting samples to be permuted only within the levels of block. In this situation, samples are never permuted between blocks, only within. We can set up this type of permutation design as follows

h <- with(env, how(blocks = block, nperm = 999))

Note that we could use the plots argument instead of blocks to restrict the permutations in the same way, but using blocks is simpler. I also set the required number of permutations for the test here.

permutations can take a number of different types of instruction

an object of class "how", whch contains details of a restricted permutation design that shuffleSet() from the permute package will use to generate permutations from, or
a number indicating the number of permutations required, in which case these are simple randomisations with no restriction, unless the strata argument is used, or
a matrix of user-specified permutations, 1 row per permutation.

To perform the design-based permutation we’ll pass h, created earlier, to anova()

set.seed(42)
p1 <- anova(mod1, permutations = h, parallel = 3)
p1

## Permutation test for rda under reduced model
## Blocks:  block 
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = spp ~ treatment + Condition(block), data = env)
##          Df Variance      F Pr(>F)  
## Model     3   329.84 1.9995  0.087 .
## Residual  9   494.88                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that I’ve run this on three cores in parallel; I have four cores on my laptop but left one free for the other software I have running.

The overall permutation test indicates no significant effect of treatment on the abundance of seedlings. We can test individual axes by adding by = "axis" to the anova() call

set.seed(24)
p1axis <- anova(mod1, permutations = h, parallel = 3, by = "axis")
p1axis

## Permutation test for rda under reduced model
## Forward tests for axes
## Blocks:  block 
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = spp ~ treatment + Condition(block), data = env)
##          Df Variance      F Pr(>F)  
## RDA1      1   284.81 5.1797  0.062 .
## RDA2      1    30.83 0.5606  0.937  
## RDA3      1    14.20 0.2582  0.937  
## Residual  9   494.88                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This confirms the earlier impression that there is a single, linear gradient in the data set. A biplot shows that this axis of variation is associated with the Moss (& Litter) removal treatment. The variation between the other treatments lies primarily along axis two and is substantially less than that associated with the Moss & Litter removal.

plot(mod1, display = c("species", "cn"), scaling = "sites", type = "n",
     xlim = c(-10.5, 1.5))
text(mod1, display = "species", scaling = 1, cex = 0.8)
text(mod1, display = "cn", scaling = 1, col = "blue", cex = 1.2,
     labels = c("Control", "Litter+Moss", "Litter", "Removal"))

Figure 1: RDA biplot showing species scores and treatment centroids.

In the above figure, I used scaling = "sites:, so-called inter-sample distance scaling, as this best represents the centroid scores, which are computed as the treatment-wise average of the sample scores.

Model-based permutation

The alternative permutation approach, known as model-based permutations, and would employ free permutation of residuals after the effects of the covariables have been accounted for. This is justified because under the null hypothesis, the residuals are freely exchangeable once the effects of the covariables are removed. There is a clear advantage of model-based permutations over design-based permutations; where the sample size is small, as it is here, there tends to be few blocks and the resulting design-based permutation test relatively weak compared to the model-based version.

It is simple to switch to model-based permutations, be setting the blocks indicator in the permutation design to NULL, removing the blocking structure from the design

setBlocks(h) <- NULL                    # remove blocking
getBlocks(h)                            # confirm

## NULL

Next we repeat the permutation test using the modified h

set.seed(51)
p2 <- anova(mod1, permutations = h, parallel = 3)
p2

## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = spp ~ treatment + Condition(block), data = env)
##          Df Variance      F Pr(>F)  
## Model     3   329.84 1.9995  0.069 .
## Residual  9   494.88                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The estimated p value is slightly smaller now. The difference between treatments is predominantly in the Moss & Litter removal with differences between the control and the other treatments lying along the insignificant axes

set.seed(83)
p2axis <- anova(mod1, permutations = h, parallel = 3, by = "axis")
p2axis

## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = spp ~ treatment + Condition(block), data = env)
##          Df Variance      F Pr(>F)  
## RDA1      1   284.81 5.1797  0.051 .
## RDA2      1    30.83 0.5606  0.949  
## RDA3      1    14.20 0.2582  0.949  
## Residual  9   494.88                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Chages in relative seedling composition

As mentioned earlier, interest is also, perhaps predominantly, in whether any of the treatments have different species composition. To test this hypothesis we standardise by the sample (row) norm using decostand(). Alternatively we could have used method = "total" to work with proportional abundances (or method = "hellinger" or method = "chi.square"). We then repeat the earlier steps, this time using only model-based permutations owing to their greater power.

spp.norm <- decostand(spp, method = "normalize", MARGIN = 1)

mod2 <- rda(spp.norm ~ treatment + Condition(block), data = env)
mod2

## Call: rda(formula = spp.norm ~ treatment + Condition(block), data =
## env)
## 
##               Inertia Proportion Rank
## Total         0.37262    1.00000     
## Conditional   0.08138    0.21839    3
## Constrained   0.07248    0.19450    3
## Unconstrained 0.21877    0.58711    9
## Inertia is variance 
## 
## Eigenvalues for constrained axes:
##    RDA1    RDA2    RDA3 
## 0.04517 0.01718 0.01012 
## 
## Eigenvalues for unconstrained axes:
##     PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9 
## 0.08026 0.07074 0.02860 0.01916 0.00989 0.00585 0.00223 0.00167 0.00038

summary(eigenvals(mod2))

## Importance of components:
##                          RDA1    RDA2    RDA3     PC1     PC2    PC3     PC4
## Eigenvalue            0.04517 0.01718 0.01012 0.08026 0.07074 0.0286 0.01916
## Proportion Explained  0.15511 0.05899 0.03475 0.27557 0.24288 0.0982 0.06577
## Cumulative Proportion 0.15511 0.21409 0.24885 0.52442 0.76730 0.8655 0.93127
##                            PC5      PC6      PC7      PC8       PC9
## Eigenvalue            0.009894 0.005852 0.002228 0.001666 0.0003765
## Proportion Explained  0.033971 0.020093 0.007649 0.005720 0.0012926
## Cumulative Proportion 0.965244 0.985338 0.992987 0.998707 1.0000000

set.seed(76)
anova(mod2, permutations = h, parallel = 3)

## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
## 
## Model: rda(formula = spp.norm ~ treatment + Condition(block), data = env)
##          Df Variance      F Pr(>F)
## Model     3 0.072475 0.9939  0.467
## Residual  9 0.218768

The results suggest no difference in species composition under the experimental manipulation.

Analysing a randomised complete block design with vegan

Gavin Simpson

May 30, 2024

Design-based permutations

Model-based permutation

Chages in relative seedling composition

References