Among the ordination methods available to ecologists there are methods that relate a species abundance or occurrence matrix to a matrix of explanatory variables. Known as constrained or canonical ordination methods, redundancy analysis (RDA) and Canonical Correspondence Analysis (CCA) are the most commonly encountered forms. A restriction of these methods is that they are only constrained if there are fewer explanatory variables as numbers of observations or species, whichever is lowest, - 1.
Relating two species matrices is not possible using RDA or CCA unless the number of species in the data set playing the explanatory role is much smaller than the number of observations. Co-inertia analysis was invented as a solution to problems of this sort, but a deficiency is that it has an underlying linear response model like RDA.
Co-correspondence analysis (Co-CA) combines the ideas of co-inertia analysis with the unimodal response model familiar to correspondence analysis (CA) or CCA methods. The aim is to related two species abundance or occurrence matrices such that the resulting decomposition into axes are those combinations that best explain the covariation between species and observations in the two matrices.
There are two forms of Co-CA;
In symmetric Co-CA, neither of the two abundance or occurrence matrices plays the predictive or explanatory role. This method is best thought of as identifying the common patterns between the two assemblages. In contrast, in predictive Co-CA a more direct regression model is fitted where one matrix plays the response role and the other the predictor role. In this way, one set of species data is used to predict the other.
The key requirement for Co-CA is that the two assemblages have been collected at the same locations, just as you would if you wanted to explain species abundance as a function of environmental factors.
As an illustration of symmetric Co-CA, we look at common patterns in a data set of beetles and plants. The data are provided with cocorresp
library("cocorresp")
data(beetles)
## log transform the beetle data
beetles <- log1p(beetles)
data(plants)
The data are observations of beetle and vascular plant species abundance at 30 roadside verges in the Netherlands. There are 126 beetle taxa and 231 vascular plant species. The abundances of the vascular plants are recorded on the 1–9 van der Maarel scale. To make the distributions of beetle species abundances more symmetric and stabilise variances, the counts are log transformed.
Both forms of Co-CA are fitted using the coca()
function. The call comprises
.,data argument supplied a suitable data frame or
matrix. This is the object used to form the terms on the right-hand side
of the formula indicated by the . placeholder,method
argument; options are "symmetric" and
"predictive".A symmetric Co-CA is fitted to the beetle and plant data sets as follows
bp.sym <- coca(beetles ~ ., data = plants, method = "symmetric")
##
## Removed some species that contained no data in: beetles, plants
Notice that it shouldn’t make any difference which of the matrices is specified on the right- or left-hand sides of the formula.
Printing the resulting object provides a relatively compact summary of the Co-CA model fitted
bp.sym
##
## Symmetric Co-Correspondence Analysis
##
## Call: symcoca(y = y, x = x, n.axes = n.axes, R0 = weights, symmetric =
## symmetric, nam.dat = nam.dat)
##
## Inertia:
## Total Explained Residual
## beetles: 3.988 3.971 0.018
## plants: 5.757 5.740 0.018
A screeplot provides a graphical summary of the dimensionality of the covariance between the two matrices.
screeplot(bp.sym)
From the screeplot, we see that most of the signal in the covariance is contained on the first 2–3 axes.
We can refit the model retaining only the useful components, in part to see how much variation in the two species data sets is explained by the useful Co-CA axes:
bp.sym <- coca(beetles ~ ., data = plants, method = "symmetric", n.axes = 3)
summary(bp.sym)
##
## Symmetric Co-Correspondence Analysis
##
## Call: symcoca(y = y, x = x, n.axes = n.axes, R0 = weights, symmetric =
## symmetric, nam.dat = nam.dat)
##
## Inertia:
## Total Explained Residual
## beetles: 3.988 1.012 2.976
## plants: 5.757 1.494 4.263
##
## Eigenvalues:
## COCA 1 COCA 2 COCA 3
## 0.25338 0.12887 0.08105
How much variation the beetle and plant data sets repectively is explained by the 3 axes?
The resulting symmetric co-correspondence analysis can be plotted in
the form of a biplot, except now we have two sets of species (variable)
scores and two sets of site (observations or sample) scores. The
biplot method can be used to draw Co-CA biplots. The
which argument selects which of the two assemblages are
drawn:
"y1" indicates the species assemblage on the left-hand
side of the formula,"y2" indicates the species assemblage on the right-hand
side of the formula.layout(matrix(1:2, ncol = 2))
biplot(bp.sym, which = "y1", main = "Beetles")
biplot(bp.sym, which = "y2", main = "Plants")
layout(1)
Some additional control is afforded by the plot()
method, but good plots of the fitted Co-CA will often require the use of
lower-level functions such as the points() and
scores() methods, as well as using appropriate sets of
scores or loadings.
We can relate the Co-CA axes to environmental variables using the
same idea as envfit(). First we look at the ordination of
the beetles
## load the environmental data
data(verges)
## fit vectors for the environmental data
sol_b <- envfit(bp.sym, verges, which = "response")
sol_b
##
## ***VECTORS
##
## COCA 1 COCA 2 r2 Pr(>r)
## Mmoisture -0.95452 0.29815 0.5749 0.001 ***
## Zmoisture -0.96693 0.25503 0.5600 0.001 ***
## Org -0.98775 0.15605 0.4313 0.005 **
## Sandiness 0.36950 0.92923 0.3054 0.021 *
## pH -0.24370 -0.96985 0.3911 0.005 **
## NO3 -0.48984 -0.87181 0.2290 0.054 .
## NH4 0.68741 0.72627 0.2105 0.074 .
## Nmin -0.19562 -0.98068 0.0216 0.787
## P 0.60334 0.79749 0.4050 0.004 **
## K 0.80614 0.59173 0.3329 0.006 **
## Nitr -0.50795 -0.86139 0.3048 0.020 *
## HourSun 0.83884 -0.54438 0.0889 0.321
## Warmth35 0.98159 0.19098 0.0383 0.657
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
Next we look at the plants ordination
## fit vectors for the environmental data
sol_p <- envfit(bp.sym, verges, which = "predictor")
sol_p
##
## ***VECTORS
##
## COCA 1 COCA 2 r2 Pr(>r)
## Mmoisture -0.90378 0.42800 0.5487 0.001 ***
## Zmoisture -0.91570 0.40186 0.5476 0.001 ***
## Org -0.94746 0.31988 0.4295 0.005 **
## Sandiness 0.42798 0.90379 0.2125 0.100 .
## pH -0.24838 -0.96866 0.4710 0.001 ***
## NO3 -0.53979 -0.84180 0.2625 0.037 *
## NH4 0.66608 0.74588 0.1852 0.119
## Nmin -0.41228 -0.91106 0.0302 0.751
## P 0.57366 0.81909 0.3581 0.006 **
## K 0.76904 0.63920 0.4319 0.002 **
## Nitr -0.55763 -0.83009 0.3740 0.008 **
## HourSun 0.80795 -0.58925 0.1136 0.263
## Warmth35 0.99995 0.01048 0.0498 0.591
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999
And finally we can redraw the ordination diagrams with the vectors over plotted.
## plot the response matrix and the fitted vectors
layout(matrix(1:2, ncol = 2))
biplot(bp.sym, which = "y1", main = "Beetles")
plot(sol_b)
biplot(bp.sym, which = "y2", main = "Plants")
plot(sol_p)
layout(1)
We will now look briefly at predictive Co-CA, using a different version of the spring meadow data, where the species have been separated into vascular plants and bryophytes. We can use one group of species to predict the other, rather than treat them symmetrically as we did above.
We start by loading cocorresp and the data sets, which come with the package
library("cocorresp")
data(bryophyte)
data(vascular)
As this is an asymmetric method we need to get the roles of the two data sets specified correctly. Here we will try to predict the bryophytes from the vascular plants
carp.pred <- coca(bryophyte ~ ., data = vascular)
carp.pred
##
## Predictive Co-Correspondence Analysis
##
## Call: predcoca.simpls(y = y, x = x, R0 = weights, n.axes = n.axes,
## nam.dat = nam.dat)
##
## Co-CA Method: simpls
##
## Role Variance
## vascular Predictor 3.113
## bryophyte Response 3.471
The printed output doesn’t tell us very much. We need to do some subsequent steps to decide how many PLS axes to use to predict the bryophytes, which we do with a leave-one-out cross validation
## determine important PLS components - takes a while
crossval(bryophyte, vascular)
## LOO - Site: 1 - Complete
## LOO - Site: 2 - Complete
## LOO - Site: 3 - Complete
## LOO - Site: 4 - Complete
## LOO - Site: 5 - Complete
## LOO - Site: 6 - Complete
## LOO - Site: 7 - Complete
## LOO - Site: 8 - Complete
## LOO - Site: 9 - Complete
## LOO - Site: 10 - Complete
## LOO - Site: 11 - Complete
## LOO - Site: 12 - Complete
## LOO - Site: 13 - Complete
## LOO - Site: 14 - Complete
## LOO - Site: 15 - Complete
## LOO - Site: 16 - Complete
## LOO - Site: 17 - Complete
## LOO - Site: 18 - Complete
## LOO - Site: 19 - Complete
## LOO - Site: 20 - Complete
## LOO - Site: 21 - Complete
## LOO - Site: 22 - Complete
## LOO - Site: 23 - Complete
## LOO - Site: 24 - Complete
## LOO - Site: 25 - Complete
## LOO - Site: 26 - Complete
## LOO - Site: 27 - Complete
## LOO - Site: 28 - Complete
## LOO - Site: 29 - Complete
## LOO - Site: 30 - Complete
## LOO - Site: 31 - Complete
## LOO - Site: 32 - Complete
## LOO - Site: 33 - Complete
## LOO - Site: 34 - Complete
## LOO - Site: 35 - Complete
## LOO - Site: 36 - Complete
## LOO - Site: 37 - Complete
## LOO - Site: 38 - Complete
## LOO - Site: 39 - Complete
## LOO - Site: 40 - Complete
## LOO - Site: 41 - Complete
## LOO - Site: 42 - Complete
## LOO - Site: 43 - Complete
## LOO - Site: 44 - Complete
## LOO - Site: 45 - Complete
## LOO - Site: 46 - Complete
## LOO - Site: 47 - Complete
## LOO - Site: 48 - Complete
## LOO - Site: 49 - Complete
## LOO - Site: 50 - Complete
## LOO - Site: 51 - Complete
## LOO - Site: 52 - Complete
## LOO - Site: 53 - Complete
## LOO - Site: 54 - Complete
## LOO - Site: 55 - Complete
## LOO - Site: 56 - Complete
## LOO - Site: 57 - Complete
## LOO - Site: 58 - Complete
## LOO - Site: 59 - Complete
## LOO - Site: 60 - Complete
## LOO - Site: 61 - Complete
## LOO - Site: 62 - Complete
## LOO - Site: 63 - Complete
## LOO - Site: 64 - Complete
## LOO - Site: 65 - Complete
## LOO - Site: 66 - Complete
## LOO - Site: 67 - Complete
## LOO - Site: 68 - Complete
## LOO - Site: 69 - Complete
## LOO - Site: 70 - Complete
##
## Cross-validation for Predictive Co-Correspondence Analysis
##
## Call: crossval(y = bryophyte, x = vascular)
##
## Cross-validatory %fit of bryophyte to vascular:
##
## COCA1 COCA2 COCA3 COCA4 COCA5 COCA6 COCA7 COCA8 COCA9 COCA10
## 17.860 24.824 26.184 26.600 28.299 28.164 27.137 25.500 24.624 24.555
## COCA11 COCA12 COCA13 COCA14 COCA15 COCA16 COCA17 COCA18 COCA19 COCA20
## 24.029 23.106 22.937 21.188 20.404 19.847 20.979 19.451 17.953 17.449
## COCA21 COCA22 COCA23 COCA24 COCA25 COCA26 COCA27 COCA28 COCA29
## 17.216 13.696 13.029 13.451 12.469 11.129 9.683 8.352 8.417
The best fit comes with 5 COCA axes, but there isn’t much change in the cross-validatory fit (%) once we go beyond two PLS axes.
We can also look at which axes explain statistically significant amounts of variance in the bryophyte data using a permutation test
carp.perm <- permutest(carp.pred, permutations = 99)
## Permutations for axis: 1 - completed
## Permutations for axis: 2 - completed
## Permutations for axis: 3 - completed
## Permutations for axis: 4 - completed
## Permutations for axis: 5 - completed
## Permutations for axis: 6 - completed
## Permutations for axis: 7 - completed
## Permutations for axis: 8 - completed
## Permutations for axis: 9 - completed
## Permutations for axis: 10 - completed
## Permutations for axis: 11 - completed
## Permutations for axis: 12 - completed
## Permutations for axis: 13 - completed
## Permutations for axis: 14 - completed
## Permutations for axis: 15 - completed
## Permutations for axis: 16 - completed
## Permutations for axis: 17 - completed
## Permutations for axis: 18 - completed
## Permutations for axis: 19 - completed
## Permutations for axis: 20 - completed
## Permutations for axis: 21 - completed
## Permutations for axis: 22 - completed
## Permutations for axis: 23 - completed
## Permutations for axis: 24 - completed
## Permutations for axis: 25 - completed
## Permutations for axis: 26 - completed
## Permutations for axis: 27 - completed
## Permutations for axis: 28 - completed
## Permutations for axis: 29 - completed
carp.perm
##
## Permutation test for predictive co-correspondence analysis:
##
## Call: permutest.coca(x = carp.pred, permutations = 99)
##
## Permutation test results:
##
## Stat. Inertia Fit % fit P-value
## COCA 1 0.26165 3.47127 0.71990 20.73880 0.01
## COCA 2 0.13845 2.75137 0.33461 9.63939 0.01
## COCA 3 0.06870 2.41676 0.15536 4.47547 0.14
## COCA 4 0.06691 2.26141 0.14181 4.08535 0.53
## COCA 5 0.07757 2.11959 0.15257 4.39529 0.45
## COCA 6 0.06758 1.96702 0.12452 3.58718 0.75
## COCA 7 0.06712 1.84250 0.11589 3.33846 0.46
## COCA 8 0.05860 1.72661 0.09559 2.75361 0.74
## COCA 9 0.05081 1.63103 0.07886 2.27176 1.00
## COCA 10 0.05208 1.55217 0.07684 2.21361 0.99
## COCA 11 0.05762 1.47533 0.08037 2.31535 0.91
## COCA 12 0.06305 1.39496 0.08274 2.38347 0.84
## COCA 13 0.06925 1.31222 0.08498 2.44811 0.68
## COCA 14 0.05462 1.22724 0.06356 1.83114 0.96
## COCA 15 0.05560 1.16368 0.06129 1.76556 0.93
## COCA 16 0.05586 1.10239 0.05832 1.68017 1.00
## COCA 17 0.06790 1.04406 0.06639 1.91247 0.71
## COCA 18 0.06411 0.97768 0.05891 1.69698 0.88
## COCA 19 0.05620 0.91877 0.04889 1.40836 0.98
## COCA 20 0.04454 0.86988 0.03709 1.06856 1.00
## COCA 21 0.04683 0.83279 0.03726 1.07324 1.00
## COCA 22 0.04640 0.79553 0.03527 1.01620 1.00
## COCA 23 0.05590 0.76026 0.04025 1.15954 1.00
## COCA 24 0.07033 0.72001 0.04731 1.36294 0.99
## COCA 25 0.04559 0.67270 0.02933 0.84499 1.00
## COCA 26 0.06559 0.64337 0.03960 1.14084 1.00
## COCA 27 0.05993 0.60376 0.03414 0.98351 1.00
## COCA 28 0.06300 0.56962 0.03376 0.97253 1.00
## COCA 29 0.06196 0.53586 0.03127 0.90071 1.00
Only the first two appear important. We refit using only two axes:
## 2 components again, refit
carp.pred <- coca(y = bryophyte, x = vascular, n.axes = 2)
carp.pred
##
## Predictive Co-Correspondence Analysis
##
## Call: predcoca.simpls(y = y, x = x, R0 = weights, n.axes = n.axes,
## nam.dat = nam.dat)
##
## Co-CA Method: simpls
##
## Role Variance
## vascular Predictor 3.113
## bryophyte Response 3.471
Now we can use the summary output to look at how much variance in the bryophytes and vascular plants we can explain with the model
summary(carp.pred)
##
## Predictive Co-Correspondence Analysis
##
## Call: predcoca.simpls(y = y, x = x, R0 = weights, n.axes = n.axes,
## nam.dat = nam.dat)
##
## Percentage Variance Explained:
##
## Y-block: variance explained in bryophyte (response)
## Comp 1 Comp 2
## Individual: 20.74 9.64
## Cumulative: 20.74 30.38
##
## X-block: variance explained in vascular (predictor)
## Comp 1 Comp 2
## Individual: 13.717 7.308
## Cumulative: 13.717 21.025
Finally, we can also produce a pair of ordination diagrams of the estimated effects
## drawn biplot
layout(matrix(1:2, ncol = 2))
biplot(carp.pred, which = "response")
biplot(carp.pred, which = "predictor", type = "text")
layout(1)