Co-correspondence Analysis

Among the ordination methods available to ecologists there are methods that relate a species abundance or occurrence matrix to a matrix of explanatory variables. Known as constrained or canonical ordination methods, redundancy analysis (RDA) and Canonical Correspondence Analysis (CCA) are the most commonly encountered forms. A restriction of these methods is that they are only constrained if there are fewer explanatory variables as numbers of observations or species, whichever is lowest, - 1.

Relating two species matrices is not possible using RDA or CCA unless the number of species in the data set playing the explanatory role is much smaller than the number of observations. Co-inertia analysis was invented as a solution to problems of this sort, but a deficiency is that it has an underlying linear response model like RDA.

Co-correspondence analysis (Co-CA) combines the ideas of co-inertia analysis with the unimodal response model familiar to correspondence analysis (CA) or CCA methods. The aim is to related two species abundance or occurrence matrices such that the resulting decomposition into axes are those combinations that best explain the covariation between species and observations in the two matrices.

There are two forms of Co-CA;

Symmetric Co-CA, and
Predictive Co-CA.

In symmetric Co-CA, neither of the two abundance or occurrence matrices plays the predictive or explanatory role. This method is best thought of as identifying the common patterns between the two assemblages. In contrast, in predictive Co-CA a more direct regression model is fitted where one matrix plays the response role and the other the predictor role. In this way, one set of species data is used to predict the other.

The key requirement for Co-CA is that the two assemblages have been collected at the same locations, just as you would if you wanted to explain species abundance as a function of environmental factors.

Symmetric Co-CA

As an illustration of symmetric Co-CA, we look at common patterns in a data set of beetles and plants. The data are provided with cocorresp

library("cocorresp")
data(beetles)
## log transform the beetle data
beetles <- log1p(beetles)
data(plants)

The data are observations of beetle and vascular plant species abundance at 30 roadside verges in the Netherlands. There are 126 beetle taxa and 231 vascular plant species. The abundances of the vascular plants are recorded on the 1–9 van der Maarel scale. To make the distributions of beetle species abundances more symmetric and stabilise variances, the counts are log transformed.

Both forms of Co-CA are fitted using the coca() function. The call comprises

a formula, where the left-hand side is a community data frame or matrix, and the right-hand side is typically a .,
a data argument supplied a suitable data frame or matrix. This is the object used to form the terms on the right-hand side of the formula indicated by the . placeholder,
the type of Co-CA model to fit, indicated by the method argument; options are "symmetric" and "predictive".

A symmetric Co-CA is fitted to the beetle and plant data sets as follows

bp.sym <- coca(beetles ~ ., data = plants, method = "symmetric")

## 
## Removed some species that contained no data in: beetles, plants

Notice that it shouldn’t make any difference which of the matrices is specified on the right- or left-hand sides of the formula.

Printing the resulting object provides a relatively compact summary of the Co-CA model fitted

bp.sym

## 
## Symmetric Co-Correspondence Analysis
## 
## Call: symcoca(y = y, x = x, n.axes = n.axes, R0 = weights, symmetric =
## symmetric, nam.dat = nam.dat)
## 
## Inertia:
##          Total Explained Residual
## beetles: 3.988     3.971    0.018
## plants:  5.757     5.740    0.018

A screeplot provides a graphical summary of the dimensionality of the covariance between the two matrices.

screeplot(bp.sym)

From the screeplot, we see that most of the signal in the covariance is contained on the first 2–3 axes.

We can refit the model retaining only the useful components, in part to see how much variation in the two species data sets is explained by the useful Co-CA axes:

bp.sym <- coca(beetles ~ ., data = plants, method = "symmetric", n.axes = 3)
summary(bp.sym)

## 
## Symmetric Co-Correspondence Analysis
## 
## Call: symcoca(y = y, x = x, n.axes = n.axes, R0 = weights, symmetric =
## symmetric, nam.dat = nam.dat)
## 
## Inertia:
##          Total Explained Residual
## beetles: 3.988     1.012    2.976
## plants:  5.757     1.494    4.263
## 
## Eigenvalues:
##  COCA 1   COCA 2   COCA 3  
## 0.25338  0.12887  0.08105

How much variation the beetle and plant data sets repectively is explained by the 3 axes?

The resulting symmetric co-correspondence analysis can be plotted in the form of a biplot, except now we have two sets of species (variable) scores and two sets of site (observations or sample) scores. The biplot method can be used to draw Co-CA biplots. The which argument selects which of the two assemblages are drawn:

"y1" indicates the species assemblage on the left-hand side of the formula,
"y2" indicates the species assemblage on the right-hand side of the formula.

layout(matrix(1:2, ncol = 2))
biplot(bp.sym, which = "y1", main = "Beetles")
biplot(bp.sym, which = "y2", main = "Plants")
layout(1)

Some additional control is afforded by the plot() method, but good plots of the fitted Co-CA will often require the use of lower-level functions such as the points() and scores() methods, as well as using appropriate sets of scores or loadings.

We can relate the Co-CA axes to environmental variables using the same idea as envfit(). First we look at the ordination of the beetles

## load the environmental data
data(verges)

## fit vectors for the environmental data
sol_b <- envfit(bp.sym, verges, which = "response")
sol_b

## 
## ***VECTORS
## 
##             COCA 1   COCA 2     r2 Pr(>r)    
## Mmoisture -0.95452  0.29815 0.5749  0.001 ***
## Zmoisture -0.96693  0.25503 0.5600  0.001 ***
## Org       -0.98775  0.15605 0.4313  0.005 ** 
## Sandiness  0.36950  0.92923 0.3054  0.021 *  
## pH        -0.24370 -0.96985 0.3911  0.005 ** 
## NO3       -0.48984 -0.87181 0.2290  0.054 .  
## NH4        0.68741  0.72627 0.2105  0.074 .  
## Nmin      -0.19562 -0.98068 0.0216  0.787    
## P          0.60334  0.79749 0.4050  0.004 ** 
## K          0.80614  0.59173 0.3329  0.006 ** 
## Nitr      -0.50795 -0.86139 0.3048  0.020 *  
## HourSun    0.83884 -0.54438 0.0889  0.321    
## Warmth35   0.98159  0.19098 0.0383  0.657    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999

Next we look at the plants ordination

## fit vectors for the environmental data
sol_p <- envfit(bp.sym, verges, which = "predictor")
sol_p

## 
## ***VECTORS
## 
##             COCA 1   COCA 2     r2 Pr(>r)    
## Mmoisture -0.90378  0.42800 0.5487  0.001 ***
## Zmoisture -0.91570  0.40186 0.5476  0.001 ***
## Org       -0.94746  0.31988 0.4295  0.005 ** 
## Sandiness  0.42798  0.90379 0.2125  0.100 .  
## pH        -0.24838 -0.96866 0.4710  0.001 ***
## NO3       -0.53979 -0.84180 0.2625  0.037 *  
## NH4        0.66608  0.74588 0.1852  0.119    
## Nmin      -0.41228 -0.91106 0.0302  0.751    
## P          0.57366  0.81909 0.3581  0.006 ** 
## K          0.76904  0.63920 0.4319  0.002 ** 
## Nitr      -0.55763 -0.83009 0.3740  0.008 ** 
## HourSun    0.80795 -0.58925 0.1136  0.263    
## Warmth35   0.99995  0.01048 0.0498  0.591    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Permutation: free
## Number of permutations: 999

And finally we can redraw the ordination diagrams with the vectors over plotted.

## plot the response matrix and the fitted vectors
layout(matrix(1:2, ncol = 2))
biplot(bp.sym, which = "y1", main = "Beetles")
plot(sol_b)
biplot(bp.sym, which = "y2", main = "Plants")
plot(sol_p)
layout(1)

Predictive CoCA using Eigen-analysis

We will now look briefly at predictive Co-CA, using a different version of the spring meadow data, where the species have been separated into vascular plants and bryophytes. We can use one group of species to predict the other, rather than treat them symmetrically as we did above.

We start by loading cocorresp and the data sets, which come with the package

library("cocorresp")
data(bryophyte)
data(vascular)

As this is an asymmetric method we need to get the roles of the two data sets specified correctly. Here we will try to predict the bryophytes from the vascular plants

carp.pred <- coca(bryophyte ~ ., data = vascular)
carp.pred

## 
## Predictive Co-Correspondence Analysis
## 
## Call: predcoca.simpls(y = y, x = x, R0 = weights, n.axes = n.axes,
## nam.dat = nam.dat)
## 
## Co-CA Method: simpls
## 
##                Role Variance
## vascular  Predictor    3.113
## bryophyte  Response    3.471

The printed output doesn’t tell us very much. We need to do some subsequent steps to decide how many PLS axes to use to predict the bryophytes, which we do with a leave-one-out cross validation

## determine important PLS components - takes a while
crossval(bryophyte, vascular)

## LOO - Site: 1 - Complete
## LOO - Site: 2 - Complete
## LOO - Site: 3 - Complete
## LOO - Site: 4 - Complete
## LOO - Site: 5 - Complete
## LOO - Site: 6 - Complete
## LOO - Site: 7 - Complete
## LOO - Site: 8 - Complete
## LOO - Site: 9 - Complete
## LOO - Site: 10 - Complete
## LOO - Site: 11 - Complete
## LOO - Site: 12 - Complete
## LOO - Site: 13 - Complete
## LOO - Site: 14 - Complete
## LOO - Site: 15 - Complete
## LOO - Site: 16 - Complete
## LOO - Site: 17 - Complete
## LOO - Site: 18 - Complete
## LOO - Site: 19 - Complete
## LOO - Site: 20 - Complete
## LOO - Site: 21 - Complete
## LOO - Site: 22 - Complete
## LOO - Site: 23 - Complete
## LOO - Site: 24 - Complete
## LOO - Site: 25 - Complete
## LOO - Site: 26 - Complete
## LOO - Site: 27 - Complete
## LOO - Site: 28 - Complete
## LOO - Site: 29 - Complete
## LOO - Site: 30 - Complete
## LOO - Site: 31 - Complete
## LOO - Site: 32 - Complete
## LOO - Site: 33 - Complete
## LOO - Site: 34 - Complete
## LOO - Site: 35 - Complete
## LOO - Site: 36 - Complete
## LOO - Site: 37 - Complete
## LOO - Site: 38 - Complete
## LOO - Site: 39 - Complete
## LOO - Site: 40 - Complete
## LOO - Site: 41 - Complete
## LOO - Site: 42 - Complete
## LOO - Site: 43 - Complete
## LOO - Site: 44 - Complete
## LOO - Site: 45 - Complete
## LOO - Site: 46 - Complete
## LOO - Site: 47 - Complete
## LOO - Site: 48 - Complete
## LOO - Site: 49 - Complete
## LOO - Site: 50 - Complete
## LOO - Site: 51 - Complete
## LOO - Site: 52 - Complete
## LOO - Site: 53 - Complete
## LOO - Site: 54 - Complete
## LOO - Site: 55 - Complete
## LOO - Site: 56 - Complete
## LOO - Site: 57 - Complete
## LOO - Site: 58 - Complete
## LOO - Site: 59 - Complete
## LOO - Site: 60 - Complete
## LOO - Site: 61 - Complete
## LOO - Site: 62 - Complete
## LOO - Site: 63 - Complete
## LOO - Site: 64 - Complete
## LOO - Site: 65 - Complete
## LOO - Site: 66 - Complete
## LOO - Site: 67 - Complete
## LOO - Site: 68 - Complete
## LOO - Site: 69 - Complete
## LOO - Site: 70 - Complete

## 
## Cross-validation for Predictive Co-Correspondence Analysis
## 
## Call: crossval(y = bryophyte, x = vascular)
## 
## Cross-validatory %fit of bryophyte to vascular:
## 
##  COCA1   COCA2   COCA3   COCA4   COCA5   COCA6   COCA7   COCA8   COCA9  COCA10  
## 17.860  24.824  26.184  26.600  28.299  28.164  27.137  25.500  24.624  24.555  
## COCA11  COCA12  COCA13  COCA14  COCA15  COCA16  COCA17  COCA18  COCA19  COCA20  
## 24.029  23.106  22.937  21.188  20.404  19.847  20.979  19.451  17.953  17.449  
## COCA21  COCA22  COCA23  COCA24  COCA25  COCA26  COCA27  COCA28  COCA29  
## 17.216  13.696  13.029  13.451  12.469  11.129   9.683   8.352   8.417

The best fit comes with 5 COCA axes, but there isn’t much change in the cross-validatory fit (%) once we go beyond two PLS axes.

We can also look at which axes explain statistically significant amounts of variance in the bryophyte data using a permutation test

carp.perm <- permutest(carp.pred, permutations = 99)

## Permutations for axis: 1 - completed
## Permutations for axis: 2 - completed
## Permutations for axis: 3 - completed
## Permutations for axis: 4 - completed
## Permutations for axis: 5 - completed
## Permutations for axis: 6 - completed
## Permutations for axis: 7 - completed
## Permutations for axis: 8 - completed
## Permutations for axis: 9 - completed
## Permutations for axis: 10 - completed
## Permutations for axis: 11 - completed
## Permutations for axis: 12 - completed
## Permutations for axis: 13 - completed
## Permutations for axis: 14 - completed
## Permutations for axis: 15 - completed
## Permutations for axis: 16 - completed
## Permutations for axis: 17 - completed
## Permutations for axis: 18 - completed
## Permutations for axis: 19 - completed
## Permutations for axis: 20 - completed
## Permutations for axis: 21 - completed
## Permutations for axis: 22 - completed
## Permutations for axis: 23 - completed
## Permutations for axis: 24 - completed
## Permutations for axis: 25 - completed
## Permutations for axis: 26 - completed
## Permutations for axis: 27 - completed
## Permutations for axis: 28 - completed
## Permutations for axis: 29 - completed

carp.perm

## 
## Permutation test for predictive co-correspondence analysis:
## 
## Call: permutest.coca(x = carp.pred, permutations = 99)
## 
## Permutation test results:
## 
##            Stat.  Inertia      Fit    % fit P-value
## COCA 1   0.26165  3.47127  0.71990 20.73880    0.01
## COCA 2   0.13845  2.75137  0.33461  9.63939    0.01
## COCA 3   0.06870  2.41676  0.15536  4.47547    0.14
## COCA 4   0.06691  2.26141  0.14181  4.08535    0.53
## COCA 5   0.07757  2.11959  0.15257  4.39529    0.45
## COCA 6   0.06758  1.96702  0.12452  3.58718    0.75
## COCA 7   0.06712  1.84250  0.11589  3.33846    0.46
## COCA 8   0.05860  1.72661  0.09559  2.75361    0.74
## COCA 9   0.05081  1.63103  0.07886  2.27176    1.00
## COCA 10  0.05208  1.55217  0.07684  2.21361    0.99
## COCA 11  0.05762  1.47533  0.08037  2.31535    0.91
## COCA 12  0.06305  1.39496  0.08274  2.38347    0.84
## COCA 13  0.06925  1.31222  0.08498  2.44811    0.68
## COCA 14  0.05462  1.22724  0.06356  1.83114    0.96
## COCA 15  0.05560  1.16368  0.06129  1.76556    0.93
## COCA 16  0.05586  1.10239  0.05832  1.68017    1.00
## COCA 17  0.06790  1.04406  0.06639  1.91247    0.71
## COCA 18  0.06411  0.97768  0.05891  1.69698    0.88
## COCA 19  0.05620  0.91877  0.04889  1.40836    0.98
## COCA 20  0.04454  0.86988  0.03709  1.06856    1.00
## COCA 21  0.04683  0.83279  0.03726  1.07324    1.00
## COCA 22  0.04640  0.79553  0.03527  1.01620    1.00
## COCA 23  0.05590  0.76026  0.04025  1.15954    1.00
## COCA 24  0.07033  0.72001  0.04731  1.36294    0.99
## COCA 25  0.04559  0.67270  0.02933  0.84499    1.00
## COCA 26  0.06559  0.64337  0.03960  1.14084    1.00
## COCA 27  0.05993  0.60376  0.03414  0.98351    1.00
## COCA 28  0.06300  0.56962  0.03376  0.97253    1.00
## COCA 29  0.06196  0.53586  0.03127  0.90071    1.00

Only the first two appear important. We refit using only two axes:

## 2 components again, refit
carp.pred <- coca(y = bryophyte, x = vascular, n.axes = 2)
carp.pred

## 
## Predictive Co-Correspondence Analysis
## 
## Call: predcoca.simpls(y = y, x = x, R0 = weights, n.axes = n.axes,
## nam.dat = nam.dat)
## 
## Co-CA Method: simpls
## 
##                Role Variance
## vascular  Predictor    3.113
## bryophyte  Response    3.471

Now we can use the summary output to look at how much variance in the bryophytes and vascular plants we can explain with the model

summary(carp.pred)

## 
## Predictive Co-Correspondence Analysis
## 
## Call: predcoca.simpls(y = y, x = x, R0 = weights, n.axes = n.axes,
## nam.dat = nam.dat)
## 
## Percentage Variance Explained:
## 
## Y-block: variance explained in bryophyte (response) 
##              Comp 1  Comp 2
## Individual:  20.74    9.64 
## Cumulative:  20.74   30.38 
## 
## X-block: variance explained in vascular (predictor) 
##              Comp 1  Comp 2
## Individual:  13.717   7.308
## Cumulative:  13.717  21.025

Finally, we can also produce a pair of ordination diagrams of the estimated effects

## drawn biplot
layout(matrix(1:2, ncol = 2))
biplot(carp.pred, which = "response")
biplot(carp.pred, which = "predictor", type = "text")

layout(1)

Co-correspondence Analysis

Gavin Simpson

May 31, 2024

Symmetric Co-CA

Predictive CoCA using Eigen-analysis