library("vegan")
library("readxl")
library("dplyr")
We’ll need readxl, and its read_xlsx()
function, to read from the XLS format files that the example data come
as.
The data set itself is quite simple and small, consisting of counts on 23 species from 16 plots, and arise from a randomised complete block designed experiment described by Špačková and colleagues [-@Spackova1998-ad] and analysed by [@Smilauer2014-ac] in their recent book using Canoco v5.
The experiment tested the effects on seedling recruitment to a range of treatments
The treatments were replicated replicated in four, randomised complete blocks.
The data are available from the accompanying website to the book Multivariate Analysis of Ecological Data using CANOCO 5 [@Smilauer2014-ac]. They are supplied as XLS format files in a ZIP archive. We can read these into R directly from the web with a little bit of effort
## Download the excel data
furl <- "https://bit.ly/seedlings-example"
td <- tempdir()
tf <- tempfile(tmpdir = td, fileext = ".xlsx")
download.file(furl, tf)
## list the sheets in the workbook
excel_sheets(tf)
## [1] "Info" "seedlspe" "seedldesign"
## read the xlsx file, sheet 2 contains species data, sheet 3 the env
spp <- read_xlsx(tf, sheet = "seedlspe", skip = 1) %>%
rename("sample_id" = "...1") %>%
tibble::column_to_rownames("sample_id")
env <- read_xlsx(tf, sheet = "seedldesign")%>%
rename("sample_id" = "...1") %>%
tibble::column_to_rownames("sample_id")
If the above code block doesn’t work for you (you get an error), then
make sure you are in the 04-thursday
directory (folder) and
have set that directory was the working directory for your R session,
and then run
spp <- read_xlsx("data/seedlings.xlsx", sheet = "seedlspe", skip = 1) %>%
rename("sample_id" = "...1") %>%
tibble::column_to_rownames("sample_id")
env <- read_xlsx("data/seedlings.xlsx", sheet = "seedldesign")%>%
rename("sample_id" = "...1") %>%
tibble::column_to_rownames("sample_id")
The block
variable is currently coded as an integer and
needs converting to a factor if we are to use it correctly in the
analysis
env <- mutate(env, block = factor(block))
The gradient lengths are short,
decorana(spp)
##
## Call:
## decorana(veg = spp)
##
## Detrended correspondence analysis with 26 segments.
## Rescaling of axes with 4 iterations.
## Total inertia (scaled Chi-square): 1.1534
##
## DCA1 DCA2 DCA3 DCA4
## Eigenvalues 0.1759 0.1898 0.11004 0.05761
## Additive Eigenvalues 0.1759 0.1898 0.11032 0.05318
## Decorana values 0.2710 0.1822 0.07219 0.02822
## Axis lengths 1.9821 1.4140 1.15480 0.87680
motivating the use of redundancy analysis (RDA). Additionally, we may be interested in how the raw abundance of seedlings change following experimental manipulation, so we may wish to focus on the proportional differences between treatments. The first case is handled naturally by RDA. The second case will require some form of standardisation by samples, say by sample totals.
First, let’s test the first null hypothesis; that there is no effect
of the treatment on seedling recruitment. This is a simple RDA. We
should take into account the block
factor when we assess
this model for significance. How we do this illustrates two potential
approaches to performing permutation tests
design-based permutations, where how the samples are permuted follows the experimental design, or
model-based permutations, where the experimental design is included in the analysis directly and residuals are permuted by simple randomisation.
There is an important difference between the two approach, one which I’ll touch on shortly.
We’ll proceed by fitting the model, conditioning on
block
to remove between block differences
mod1 <- rda(spp ~ treatment + Condition(block), data = env)
mod1
## Call: rda(formula = spp ~ treatment + Condition(block), data = env)
##
## Inertia Proportion Rank
## Total 990.7875 1.0000
## Conditional 166.0708 0.1676 3
## Constrained 329.8375 0.3329 3
## Unconstrained 494.8792 0.4995 9
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3
## 284.81 30.83 14.20
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
## 226.83 139.51 72.77 30.11 9.81 9.14 2.80 2.19 1.73
There is a strong single, linear gradient in the data as evidenced by the relative magnitudes of the eigenvalues (here expressed as proportions of the total variance)
summary(eigenvals(mod1))
## Importance of components:
## RDA1 RDA2 RDA3 PC1 PC2 PC3
## Eigenvalue 284.8141 30.82540 14.19796 226.827 139.5119 72.76789
## Proportion Explained 0.3453 0.03738 0.01722 0.275 0.1692 0.08823
## Cumulative Proportion 0.3453 0.38272 0.39994 0.675 0.8441 0.93237
## PC4 PC5 PC6 PC7 PC8 PC9
## Eigenvalue 30.10820 9.80812 9.13690 2.797943 2.190944 1.730598
## Proportion Explained 0.03651 0.01189 0.01108 0.003393 0.002657 0.002098
## Cumulative Proportion 0.96888 0.98077 0.99185 0.995245 0.997902 1.000000
A design-based permutation test of these data would be one
conditioned on the block
variable, by restricting samples
to be permuted only within the levels of block
. In
this situation, samples are never permuted between blocks, only within.
We can set up this type of permutation design as follows
h <- with(env, how(blocks = block, nperm = 999))
Note that we could use the plots
argument instead of
blocks
to restrict the permutations in the same way, but
using blocks
is simpler. I also set the required number of
permutations for the test here.
permutations
can take a number of different types of
instruction
an object of class "how"
, whch contains details of a
restricted permutation design that shuffleSet()
from the
permute package will use to generate permutations from,
or
a number indicating the number of permutations required, in which
case these are simple randomisations with no restriction, unless the
strata
argument is used, or
a matrix of user-specified permutations, 1 row per permutation.
To perform the design-based permutation we’ll pass h
,
created earlier, to anova()
set.seed(42)
p1 <- anova(mod1, permutations = h, parallel = 3)
p1
## Permutation test for rda under reduced model
## Blocks: block
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spp ~ treatment + Condition(block), data = env)
## Df Variance F Pr(>F)
## Model 3 329.84 1.9995 0.087 .
## Residual 9 494.88
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note that I’ve run this on three cores in parallel; I have four cores on my laptop but left one free for the other software I have running.
The overall permutation test indicates no significant effect of
treatment on the abundance of seedlings. We can test individual axes by
adding by = "axis"
to the anova()
call
set.seed(24)
p1axis <- anova(mod1, permutations = h, parallel = 3, by = "axis")
p1axis
## Permutation test for rda under reduced model
## Forward tests for axes
## Blocks: block
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spp ~ treatment + Condition(block), data = env)
## Df Variance F Pr(>F)
## RDA1 1 284.81 5.1797 0.062 .
## RDA2 1 30.83 0.5606 0.937
## RDA3 1 14.20 0.2582 0.937
## Residual 9 494.88
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This confirms the earlier impression that there is a single, linear gradient in the data set. A biplot shows that this axis of variation is associated with the Moss (& Litter) removal treatment. The variation between the other treatments lies primarily along axis two and is substantially less than that associated with the Moss & Litter removal.
plot(mod1, display = c("species", "cn"), scaling = "sites", type = "n",
xlim = c(-10.5, 1.5))
text(mod1, display = "species", scaling = 1, cex = 0.8)
text(mod1, display = "cn", scaling = 1, col = "blue", cex = 1.2,
labels = c("Control", "Litter+Moss", "Litter", "Removal"))
In the above figure, I used scaling = "sites:
, so-called
inter-sample distance scaling, as this best represents the
centroid scores, which are computed as the treatment-wise average of the
sample scores.
The alternative permutation approach, known as model-based permutations, and would employ free permutation of residuals after the effects of the covariables have been accounted for. This is justified because under the null hypothesis, the residuals are freely exchangeable once the effects of the covariables are removed. There is a clear advantage of model-based permutations over design-based permutations; where the sample size is small, as it is here, there tends to be few blocks and the resulting design-based permutation test relatively weak compared to the model-based version.
It is simple to switch to model-based permutations, be setting the
blocks indicator in the permutation design to NULL
,
removing the blocking structure from the design
setBlocks(h) <- NULL # remove blocking
getBlocks(h) # confirm
## NULL
Next we repeat the permutation test using the modified
h
set.seed(51)
p2 <- anova(mod1, permutations = h, parallel = 3)
p2
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spp ~ treatment + Condition(block), data = env)
## Df Variance F Pr(>F)
## Model 3 329.84 1.9995 0.069 .
## Residual 9 494.88
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimated p value is slightly smaller now. The difference between treatments is predominantly in the Moss & Litter removal with differences between the control and the other treatments lying along the insignificant axes
set.seed(83)
p2axis <- anova(mod1, permutations = h, parallel = 3, by = "axis")
p2axis
## Permutation test for rda under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spp ~ treatment + Condition(block), data = env)
## Df Variance F Pr(>F)
## RDA1 1 284.81 5.1797 0.051 .
## RDA2 1 30.83 0.5606 0.949
## RDA3 1 14.20 0.2582 0.949
## Residual 9 494.88
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As mentioned earlier, interest is also, perhaps predominantly, in
whether any of the treatments have different species composition. To
test this hypothesis we standardise by the sample (row) norm using
decostand()
. Alternatively we could have used
method = "total"
to work with proportional abundances (or
method = "hellinger"
or
method = "chi.square"
). We then repeat the earlier steps,
this time using only model-based permutations owing to their greater
power.
spp.norm <- decostand(spp, method = "normalize", MARGIN = 1)
mod2 <- rda(spp.norm ~ treatment + Condition(block), data = env)
mod2
## Call: rda(formula = spp.norm ~ treatment + Condition(block), data =
## env)
##
## Inertia Proportion Rank
## Total 0.37262 1.00000
## Conditional 0.08138 0.21839 3
## Constrained 0.07248 0.19450 3
## Unconstrained 0.21877 0.58711 9
## Inertia is variance
##
## Eigenvalues for constrained axes:
## RDA1 RDA2 RDA3
## 0.04517 0.01718 0.01012
##
## Eigenvalues for unconstrained axes:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9
## 0.08026 0.07074 0.02860 0.01916 0.00989 0.00585 0.00223 0.00167 0.00038
summary(eigenvals(mod2))
## Importance of components:
## RDA1 RDA2 RDA3 PC1 PC2 PC3 PC4
## Eigenvalue 0.04517 0.01718 0.01012 0.08026 0.07074 0.0286 0.01916
## Proportion Explained 0.15511 0.05899 0.03475 0.27557 0.24288 0.0982 0.06577
## Cumulative Proportion 0.15511 0.21409 0.24885 0.52442 0.76730 0.8655 0.93127
## PC5 PC6 PC7 PC8 PC9
## Eigenvalue 0.009894 0.005852 0.002228 0.001666 0.0003765
## Proportion Explained 0.033971 0.020093 0.007649 0.005720 0.0012926
## Cumulative Proportion 0.965244 0.985338 0.992987 0.998707 1.0000000
set.seed(76)
anova(mod2, permutations = h, parallel = 3)
## Permutation test for rda under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: rda(formula = spp.norm ~ treatment + Condition(block), data = env)
## Df Variance F Pr(>F)
## Model 3 0.072475 0.9939 0.467
## Residual 9 0.218768
The results suggest no difference in species composition under the experimental manipulation.