In this example, we will use constrained ordination to explore the environmental controls on the species composition of the vegetation of spring meadows from the mountains of the westernmost Carpathians (Hájek et al, 2002). The species composition data are records of all vascular plants and bryophytes in spring fen meadows. In this field, the samples are commonly referred to as relevés. The data are cover estimates recorded using the Braun-Blanquet scale:
which is a method widely used in ecology to survey large areas rapidly. The Braun-Blanquet values have been recoded to the numeric values 1–7: on this scale the relationship with species cover is not linear, but rather corresponds approximately to a logarithmic transformation of the original cover values. In that sense then, the original cover values represented by the data have already been subjected to a log-transformation.
The relevés are associated with a set of environmental data from measurements on the spring water chemistry, the organic content of the soil, and the slope of each sampling location.
# load vegan, dplyr & readr
library("vegan")
library("dplyr")
library("readr")
# load the data
spp <- read_csv("https://bit.ly/meadows-species") |>
rename("sample_id" = "...1") |>
tibble::column_to_rownames("sample_id")
env <- read_csv("https://bit.ly/meadows-env") |>
rename("sample_id" = "...1")
We can look at the number of species, samples, and environmental variables:
dim(spp)
## [1] 70 285
dim(env)
## [1] 70 16
names(env)
## [1] "sample_id" "Ca" "Mg" "Fe" "K" "Na"
## [7] "Si" "SO4" "PO4" "NO3" "NH3" "Cl"
## [13] "Corg" "pH" "conduct" "slope"
We need to log-transform the ion concentration values, but not the other variables
env <- env |>
mutate(across(Ca:Cl, .fns = log10))
It is easier to do this in the data than in the formula for the ordination model, but of course you could do this in the formula too. It is easier to do this as a data processing step here, because we will standardise the environmental data prior to fitting the model, to illustrate how to get standardised effect sizes.
To standardise the data we could use the scale()
function, but it is instructive to see how data wrangling steps like
this can be done with some simple dplyr commands.
z_score <- function(x) {
(x - mean(x)) / sd(x)
}
env <- env |>
mutate(across(-sample_id, .fns = z_score))
The final step is to remove the sample_id
column:
env <- env |>
tibble::column_to_rownames("sample_id")
Now we can fit a CCA, downweighting rare taxa:
ord <- cca(downweight(spp) ~ ., data = env)
ord
## Call: cca(formula = downweight(spp) ~ Ca + Mg + Fe + K + Na + Si + SO4
## + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct + slope, data = env)
##
## Inertia Proportion Rank
## Total 3.0963 1.0000
## Constrained 1.1002 0.3553 15
## Unconstrained 1.9961 0.6447 54
## Inertia is scaled Chi-square
##
## Eigenvalues for constrained axes:
## CCA1 CCA2 CCA3 CCA4 CCA5 CCA6 CCA7 CCA8 CCA9 CCA10 CCA11
## 0.4638 0.1225 0.0728 0.0625 0.0559 0.0489 0.0452 0.0407 0.0398 0.0330 0.0303
## CCA12 CCA13 CCA14 CCA15
## 0.0251 0.0227 0.0200 0.0169
##
## Eigenvalues for unconstrained axes:
## CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8
## 0.15674 0.12581 0.09683 0.08406 0.07758 0.07382 0.07041 0.06289
## (Showing 8 of 54 unconstrained eigenvalues)
We can do an omnibus test for the overall model using
anova()
set.seed(1)
anova(ord, permutations = 999)
## Permutation test for cca under reduced model
## Permutation: free
## Number of permutations: 999
##
## Model: cca(formula = downweight(spp) ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct + slope, data = env)
## Df ChiSquare F Pr(>F)
## Model 15 1.1002 1.9841 0.001 ***
## Residual 54 1.9962
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
and we can also test the significance of the constrained axes in turn
set.seed(42)
anova(ord, permutations = 499, by = "axis", parallel = 4)
## Permutation test for cca under reduced model
## Forward tests for axes
## Permutation: free
## Number of permutations: 499
##
## Model: cca(formula = downweight(spp) ~ Ca + Mg + Fe + K + Na + Si + SO4 + PO4 + NO3 + NH3 + Cl + Corg + pH + conduct + slope, data = env)
## Df ChiSquare F Pr(>F)
## CCA1 1 0.46381 12.5471 0.002 **
## CCA2 1 0.12246 3.3129 0.004 **
## CCA3 1 0.07284 1.9705 0.374
## CCA4 1 0.06249 1.6906 0.724
## CCA5 1 0.05594 1.5133 0.918
## CCA6 1 0.04890 1.3230 0.998
## CCA7 1 0.04525 1.2240 1.000
## CCA8 1 0.04073 1.1019 1.000
## CCA9 1 0.03978 1.0761 1.000
## CCA10 1 0.03296 0.8917 1.000
## CCA11 1 0.03033 0.8204 1.000
## CCA12 1 0.02507 0.6781 1.000
## CCA13 1 0.02267 0.6134 1.000
## CCA14 1 0.02002 0.5416 1.000
## CCA15 1 0.01689 0.4570 1.000
## Residual 54 1.99615
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
While the omnibus test indicates that the model includes some significant effects somewhere, the tests of the constrained axes indicates that most of the effect is in the first two constrained axes, suggesting that we have used too many superfluous environmental variables. This is also suggested by the strong arch in the fitted ordination
plot(ord)
We can run forward selection with the adjusted \(R^2\) stopping rules to select a more parsimonious model
set.seed(67)
lwr <- cca(downweight(spp) ~ 1, data = env)
ord2 <- ordiR2step(lwr, scope = formula(ord),
permutations = how(nperm = 9999))
The resulting model includes seven environmental variables
ord2$anova
## R2.adj Df AIC F Pr(>F)
## + Ca 0.12477 1 401.70 10.8236 0.0001 ***
## + NO3 0.13567 1 401.80 1.8376 0.0014 **
## + Na 0.14582 1 401.92 1.8000 0.0004 ***
## + Si 0.15580 1 402.04 1.7647 0.0007 ***
## + Corg 0.16346 1 402.32 1.6009 0.0024 **
## + NH3 0.16836 1 402.77 1.4055 0.0193 *
## + Fe 0.17280 1 403.25 1.3608 0.0310 *
## <All variables> 0.17679
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
An overall test of the significance can be computed using
anova()
set.seed(3)
anova(ord2, permutations = how(nperm = 9999), parallel = 4)
## Permutation test for cca under reduced model
## Permutation: free
## Number of permutations: 9999
##
## Model: cca(formula = downweight(spp) ~ Ca + NO3 + Na + Si + Corg + NH3 + Fe, data = env)
## Df ChiSquare F Pr(>F)
## Model 7 0.79547 3.0622 1e-04 ***
## Residual 62 2.30083
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
and the marginal effects of the remaining variables can also be assessed
set.seed(34)
anova(ord2, by = "margin", permutations = how(nperm = 9999), parallel = 4)
## Permutation test for cca under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 9999
##
## Model: cca(formula = downweight(spp) ~ Ca + NO3 + Na + Si + Corg + NH3 + Fe, data = env)
## Df ChiSquare F Pr(>F)
## Ca 1 0.16510 4.4490 0.0001 ***
## NO3 1 0.05731 1.5443 0.0379 *
## Na 1 0.05815 1.5669 0.0341 *
## Si 1 0.06288 1.6944 0.0187 *
## Corg 1 0.05808 1.5651 0.0338 *
## NH3 1 0.05236 1.4109 0.0651 .
## Fe 1 0.05050 1.3608 0.0845 .
## Residual 62 2.30083
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For a more parsimonious model we might also consider dropping the
effects of log NH3
and log Fe
:
ord3 <- update(ord2, . ~ . - Fe - NH3)
set.seed(3)
anova(ord3, permutations = how(nperm = 9999), parallel = 4)
## Permutation test for cca under reduced model
## Permutation: free
## Number of permutations: 9999
##
## Model: cca(formula = downweight(spp) ~ Ca + NO3 + Na + Si + Corg, data = env)
## Df ChiSquare F Pr(>F)
## Model 5 0.69252 3.6876 1e-04 ***
## Residual 64 2.40379
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
set.seed(34)
anova(ord3, by = "margin", permutations = how(nperm = 9999), parallel = 4)
## Permutation test for cca under reduced model
## Marginal effects of terms
## Permutation: free
## Number of permutations: 9999
##
## Model: cca(formula = downweight(spp) ~ Ca + NO3 + Na + Si + Corg, data = env)
## Df ChiSquare F Pr(>F)
## Ca 1 0.16745 4.4584 0.0001 ***
## NO3 1 0.06924 1.8434 0.0105 *
## Na 1 0.06979 1.8580 0.0106 *
## Si 1 0.06706 1.7854 0.0147 *
## Corg 1 0.06013 1.6009 0.0288 *
## Residual 64 2.40379
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can prepare an ordination diagram of the fitted CCA, using the LC scores to limit the visual impact of the arch, which remains even now, because of the strong turnover in the species composition among sites
plot(ord3, display = c("lc", "bp"), type = "n")
points(ord3, display = "lc")
ordipointlabel(ord3, display = "lc", add = TRUE)
text(ord3, display = "bp", col = "blue")
Also with species scores
scl <- "species"
plot(ord3, display = c("species", "lc", "bp"), type = "n", scaling = scl)
points(ord3, display = "species", pch = 3, col = "red", cex = 0.7)
points(ord3, display = "lc", scaling = scl)
ordipointlabel(ord3, display = "lc", add = TRUE, scaling = scl)
text(ord3, display = "bp", col = "blue", scaling = scl)