Other topics

class: inverse, middle, left, my-title-slide, title-slide

.title[
# Other topics
]
.author[
### Gavin L. Simpson
]
.date[
### May 30, 2024
]

---

class: inverse middle center big-subsection

# Welcome

---

# Other topics

* Co-correspondence analysis (Co-CA)
* Principal Response Curves (PRC)
* Mantel test
* Ordination diagnostics

---
class: inverse middle center big-subsection

# Co-CA

---

# Co-correspondence analysis

We may be interested in relating two species data sets to one another

Can can't do this using CCA with one data set as the response and the other as the predictors, because this would be too many predictors

We might also not want to treat one data set as the response, we might want to analyse the data symmetrically

Co-correspondence Analysis (Co-CA) is a suitable method

---

# Co-correspondence analysis

There are two types of Co-CA

1. symmetric Co-CA, and

2. predictive Co-CA

Symmetric Co-CA finds axes in both data sets which maximise the covariation of the two data sets

Predictive Co-CA finds directions in one data set which best predict the response data set

---

# Symmetric Co-CA

```r
library("cocorresp")
data(beetles)
## log transform the beetle data
beetles <- log1p(beetles)
data(plants)
```

---

# Symmetric Co-CA

```r
bp.sym <- coca(beetles ~ ., data = plants, method = "symmetric")
```

```
## 
## Removed some species that contained no data in: beetles, plants
```

```r
bp.sym
```

```
## 
## Symmetric Co-Correspondence Analysis
## 
## Call: symcoca(y = y, x = x, n.axes = n.axes, R0 = weights, symmetric =
## symmetric, nam.dat = nam.dat)
## 
## Inertia:
##          Total Explained Residual
## beetles: 3.988     3.971    0.018
## plants:  5.757     5.740    0.018
```

---

# Symmetric Co-CA

``` r
screeplot(bp.sym)
```

---

# Symmetric Co-CA

```r
layout(matrix(1:2, ncol = 2))
biplot(bp.sym, which = "y1", main = "Beetles")
biplot(bp.sym, which = "y2", main = "Plants")
layout(1)
```

---
class: inverse middle center big-subsection

# PRC

---

# Principal Response Curves

PRC is a special form of redundancy analysis (RDA) that is useful when you want to compare the development of a biological community in time, under different conditions (treatments)

The different conditions don't have to be experimental; could be used for monitoring relative to a control site or control period

---

# Principal Response Curves

PRC focuses on the temporally structured effects of different levels of a factor

In standard RDA (etc) it is difficult to compare the temporal trajectories of plots as the time direction is usually a complex path through the ordination space

PRC is designed to focus specifically on the temporal effects and their differences among sites

Allows for a PRC diagram that displays these time-structured effects optimally

---

# Principal Response Curves

Assume `\(k\)` treatment levels in a factor `\(F\)`

We observed community composition (abundance) at the same set of samples at (the same) multiple time points

The time points are coded as `\(t\)`

PRC fits a *partial* RDA of the form

``` r
rda(comm ~ F:t + Condition(t), data = df)
```

As `\(t\)` is partialled out, the PRC represents the overall **differences** among treatment level and how these difference change through time

---

# PRC &mdash; pyrifos example

Data are log transformed abundances of aquatic invertebrate in twelve ditches studied in eleven times before and after an insecticide treatment

12 mesocosms were allocated randomly to treatments, with 4 controls, while the remaining 8 mesocosms were treated with a dose of an insecticide, *chloropyrifos* (0.1, 0.9, 6, 44 μg/ L)

Invertebrates were samples 11 times, from 4 weeks *prior* to treatment through 24 weeks post-treatment (132 samples total)

.small[
van den Brink, P.J. & ter Braak, C.J.F. (1999). Principal response curves: Analysis of time-dependent multivariate responses of biological community to stress. Environmental Toxicology and Chemistry, 18, 138–148.]

---

# PRC &mdash; pyrifos example

``` r
data(pyrifos)
dim(pyrifos)
```

```
## [1] 132 178
```

``` r
ditch <- gl(12, 1, length = 132)
week <- gl(11, 12, labels = c(-4, -1, 0.1, 1, 2, 4, 8, 12, 15, 19, 24))
dose <- factor(rep(c(0.1, 0, 0, 0.9, 0, 44, 6, 0.1, 44, 0.9, 0, 6), 11))
```

Important that the control (or reference) site is the reference (first) level of the "treatment" factor &mdash; use `relevel()`

---

# PRC &mdash; pyrifos example

``` r
mod <- prc(pyrifos, dose, week)
```

.small[

``` r
mod
```

```
## 
## Call: prc(response = pyrifos, treatment = dose, time = week)
## 
##                Inertia Proportion Rank
## Total         288.9920     1.0000     
## Conditional    63.3493     0.2192   10
## Constrained    96.6837     0.3346   44
## Unconstrained 128.9589     0.4462   77
## 
## Inertia is variance
## 
## Eigenvalues for constrained axes:
##   RDA1   RDA2   RDA3   RDA4   RDA5   RDA6   RDA7   RDA8   RDA9  RDA10  RDA11 
## 25.282  8.297  6.044  4.766  4.148  3.857  3.587  3.334  3.087  2.551  2.466 
##  RDA12  RDA13  RDA14  RDA15  RDA16  RDA17  RDA18  RDA19  RDA20  RDA21  RDA22 
##  2.209  2.129  1.941  1.799  1.622  1.579  1.440  1.398  1.284  1.211  1.133 
##  RDA23  RDA24  RDA25  RDA26  RDA27  RDA28  RDA29  RDA30  RDA31  RDA32  RDA33 
##  1.001  0.923  0.862  0.788  0.750  0.712  0.685  0.611  0.584  0.537  0.516 
##  RDA34  RDA35  RDA36  RDA37  RDA38  RDA39  RDA40  RDA41  RDA42  RDA43  RDA44 
##  0.442  0.417  0.404  0.368  0.340  0.339  0.306  0.279  0.271  0.205  0.179 
## 
## Eigenvalues for unconstrained axes:
##    PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8 
## 17.156  9.189  7.585  6.064  5.730  4.843  4.518  4.105 
## (Showing 8 of 77 unconstrained eigenvalues)
```
]

---

# PRC &mdash; pyrifos example

``` r
ctrl <- how(plots = Plots(strata = ditch,type = "free"),
    within = Within(type = "series"), nperm = 99)
anova(mod, permutations = ctrl, first = TRUE)
```

```
## Permutation test for rda under reduced model
## Plots: ditch, plot permutation: free
## Permutation: series
## Number of permutations: 99
## 
## Model: prc(response = pyrifos, treatment = dose, time = week)
##          Df Variance      F Pr(>F)   
## RDA1      1   25.282 15.096   0.01 **
## Residual 77  128.959                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

# PRC &mdash; pyrifos example

``` r
plot(mod, species = FALSE, legpos = "topright")
```

---

# PRC &mdash; pyrifos example

.small[

``` r
plot(mod, species = FALSE, legpos = "topright")
logabu <- colSums(pyrifos)
scrs <- scores(mod, display = "species", choices = 1)
linestack(scrs[logabu > 150, , drop = FALSE]); axis(side = 2)
```
]

.row[

.col-9[
![](slides_files/figure-html/unnamed-chunk-9-1.svg)
]

.col-3[
![](slides_files/figure-html/unnamed-chunk-10-1.svg)
]
]

---

# Mantel test

The Mantel test is used to compare two similarity or distance matrices computed for the same set of samples

We might compare

* ecological distances  with
* geographical distances

E.g.

* `vegdist(spp, method = foo)`, &
* `vegdist(coords, method = "euclidean")`

---

# Mantel test

The classical Mantel test involves unrolling one triangle of each distance matrix & computing the crossproduct (sum of the scalar products) of the two vectors

.row[

.col-6[

``` r
df <- data.frame(y1 = c(0, 0, 1),
    y2 = c(4, 1, 0),
    y3 = c(8, 1, 0))
D <- as.matrix(vegdist(df))
D
```

```
##           1         2 3
## 1 0.0000000 0.7142857 1
## 2 0.7142857 0.0000000 1
## 3 1.0000000 1.0000000 0
```

``` r
D[upper.tri(D)] # unroll
```

```
## [1] 0.7142857 1.0000000 1.0000000
```
]

.col-6[

``` r
crossprod(1:3, 1:3)
```

```
##      [,1]
## [1,]   14
```

``` r
sum(1:3 * 1:3)
```

```
## [1] 14
```
]
]

This yields the Matel statistic `\(z_{M}\)`

---

# Mantel test

Alternatively we might standaridize the two vectors of distances before computing the crossproduct, and then divide the crossproduct by `\((n(n - 1) / 2) - 1\)`, the number of distances in each triangle minus 1

This yields statisic `\(r_{M}\)`

`\(r_{M}\)` is bounded between -1 and +1 and behavles like a correlation coefficient

Or we might convert the distances to ranks before computing `\(r_{M}\)`, which yields the Spearman correlation of the distance vectors

---

# Mantel test

Indeed, the Mantel statistic can be seen as the correlation between two distance matrices `\(\mathbf{D_Y}\)` and `\(\mathbf{D_X}\)`

Which is what *vegan* computes via `mantel()` using one of the Pearson, Spearman, or Kendall correlations

---

# Mantel test

However, we can't test this correlation in the usual way: there are `\((n(n-1)) / 2\)` distances from only `\(n\)` objects

Mantel developed an asymptotic test using math

In *vegan* we test the correlation coefficient via a permutation test

We permute the rows _and_ columns of `\(\mathbf{D_X}\)`

---

# Mantel test Lichen pastures

Is vegetation in the lichen pastures related to the environment?

``` r
data(varespec, varechem, package = "vegan")
D_veg <- vegdist(varespec) # Bray-Curtis
D_env <- vegdist(scale(varechem), "euclidean")
mantel(D_veg, D_env, permutations = how(nperm = 999))
```

```
## 
## Mantel statistic based on Pearson's product-moment correlation 
## 
## Call:
## mantel(xdis = D_veg, ydis = D_env, permutations = how(nperm = 999)) 
## 
## Mantel statistic r: 0.3047 
##       Significance: 0.001 
## 
## Upper quantiles of permutations (null model):
##   90%   95% 97.5%   99% 
## 0.120 0.144 0.173 0.202 
## Permutation: free
## Number of permutations: 999
```
---

# Mantel test Orabatid mites

Does Orabatid mite composition **similarity** decrease with distance?

``` r
data(mite, mite.xy, package = "vegan")
D_mite <- vegdist(mite) # Bray-Curtis
D_xy <- vegdist(mite.xy, "euclidean")
mantel(D_mite, D_xy, permutations = how(nperm = 999), method = "spearman")
```

```
## 
## Mantel statistic based on Spearman's rank correlation rho 
## 
## Call:
## mantel(xdis = D_mite, ydis = D_xy, method = "spearman", permutations = how(nperm = 999)) 
## 
## Mantel statistic r: 0.4904 
##       Significance: 0.001 
## 
## Upper quantiles of permutations (null model):
##    90%    95%  97.5%    99% 
## 0.0427 0.0567 0.0659 0.0784 
## Permutation: free
## Number of permutations: 999
```

???

Ecological theory sometimes predicts relationships between distance matrices. E.g. in the Neutral Theory, which predicts a monotonic declining relationship between community composition similarity and geographic separation, distance decay of community similarity

Yes, the coef is positive with distance and we we used a dissimilarity matrix for composition

---

# Partial Mantel test

Say we have `\(\mathbf{D}_1\)`, `\(\mathbf{D}_2\)`, and `\(\mathbf{D}_3\)`

The _partial_ Mantel statistic `\(r_M(\mathbf{D}_1\mathbf{D}_2 \boldsymbol{\cdot} \mathbf{D})\)` is the correlation between `\(\mathbf{D}_1\)` and `\(\mathbf{D}_2\)`, while controlling for the effects of `\(\mathbf{D}_3\)`

In *vegan* this is available via `mantel.partial()`

---

# Mantel test

The Mantel test is a test of the correlation between two distance matrices

It should only be used to test hypotheses about distances, not about raw data

The Mantel test between `\(\mathbf{D_Y}\)` and `\(\mathbf{D_X}\)` is not equivalent to

1. a test of the correlation coefficient betwen two vectors of raw data, or
2. a test of a linear regression of `\(\mathbf{y}\)` on matrix `\(\mathbf{X}\)`, or
3. a test in canonial analysis of multivariate `\(\mathbf{Y}\)` on `\(\mathbf{X}\)`

---
class: inverse middle center big-subsection

# Other stuff

---

# Diagnostics for constrained ordinations

**vegan** provides a series of diagnostics to help assess the model fit

* `goodness()`
* `inertcomp()`
* `spenvcor()`
* `intersetcor()`
* `vif.caa()`

---

# Diagnostics | goodness of fit

`goodness()` computes a goodness of fit statistic for species or sites, controlled by argument `display`

Gives the cumulative proportion of variance explained by each axis

```r
upr <- cca(varespec ~ ., data = varechem)
lwr <- cca(varespec ~ 1, data = varechem)
set.seed(1)

mods <- ordistep(lwr, scope = formula(upr), trace = 0)
head(goodness(mods))
```

```
##                  CCA1        CCA2      CCA3
## Callvulg 0.0062471656 0.318907619 0.8254657
## Empenigr 0.1164701677 0.137604904 0.1953245
## Rhodtome 0.0999089739 0.169697909 0.1824153
## Vaccmyrt 0.2361482843 0.240516323 0.2406730
## Vaccviti 0.1523704591 0.156502301 0.2110550
## Pinusylv 0.0009244423 0.004802076 0.0060096
```

---

# Diagnostics | inertia decomposition

`inertcomp()` decomposes the variance in samples or species in partial, constrained, and unconstrained components

* `statistic = "explained` (default) gives the decomposition in terms of variance
* `statistic = "distance"` gives decomposition in terms of the the residual distance

```r
head(inertcomp(mods, proportional = TRUE))
```

```
##                CCA        CA
## Callvulg 0.8254657 0.1745343
## Empenigr 0.1953245 0.8046755
## Rhodtome 0.1824153 0.8175847
## Vaccmyrt 0.2406730 0.7593270
## Vaccviti 0.2110550 0.7889450
## Pinusylv 0.0060096 0.9939904
```

---

# Diagnostics | species-environment correlations

`spenvcor()` returns the (weighted) correlation between the weighted average-based and the linear combination-based sets of site scores

A *poor* measure of goodness of fit. Sensitive to

* outliers (like all correlations)
* overfitting (using too many constraints)

Better models can have poorer species-environment correlations

```r
spenvcor(mods)
```

```
##      CCA1      CCA2      CCA3 
## 0.8554793 0.8131627 0.8792221
```

---

# Diagnostics | interset correlations

`intersetcor()` returns the (weighted) correlation between the weighted average-based site scores and each constraint variable

Another *poor* diagnostic

* correlation based
* focuses on a single constraint--axis combination at a time

```r
intersetcor(mods)
```

```
##          CCA1       CCA2      CCA3
## Al  0.7356445 -0.1304293 0.4260453
## P  -0.3588931 -0.6109601 0.4478786
## K  -0.3767902 -0.1339051 0.7759566
```

Vector fitting (`envfit()`) or biplot scores (`scores(model, display = "bp")`) are better alternatives

---

# References

.smaller[
* Anderson, M.J., 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecol. 26, 32&ndash;46
* Anderson, M.J., 2006. Distance-based tests for homogeneity of multivariate dispersions. Biometrics 62, 245&ndash;253
* Anderson, M.J., Walsh, D.C.I., 2013. PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing? Ecol. Monogr. 83, 557&ndash;574
* Anderson, M.J., Walsh, D.C.I., Robert Clarke, K., Gorley, R.N., Guerra-Castro, E., 2017. Some solutions to the multivariate Behrens-Fisher problem for dissimilarity-based analyses. Aust. N. Z. J. Stat. 59, 57&ndash;79
* Blanchet, F.G., Legendre, P., Borcard, D., 2008. Forward selection of explanatory variables. Ecology 89, 2623&ndash;2632
* Legendre, P., Anderson, M.J., 1999. Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecol. Monogr. 69, 1&ndash;24
* McArdle, B.H., Anderson, M.J., 2001. Fitting Multivariate Models to Community Data: A Comment on Distance-Based Redundancy Analysis. Ecology 82, 290&ndash;297
* Warton, D.I., Wright, S.T., Wang, Y., 2012. Distance-based multivariate analyses confound location and dispersion effects. Methods Ecol. Evol. 3, 89&ndash;101
]

---

# Overview

For unconstrained ordination, any method will identify and show the main patterns in the species data

If you are describing variation in environmental variables use PCA

---

# Overview

If you have species data, are you interested in absolute counts or in relative composition?

If absolute counts, PCA, perhaps with a log- or sqrt-transformation of the counts is a good option

If relative composition, CA is a good choice unless you have samples with odd species composition (outliers). PCA with a Hellinger transformation will often work just as well as CA and isn't affected as much by outliers

If you must use dissimilarities instead of data, then use NMDS

---

# Overview

For constrained ordination, use RDA or CCA in preference, over PERMANOVA or db-RDA, unless you really want or need to use a specific dissimilarity metric

If you have non-species responses, use RDA

If you have species responses and want to model the raw abundances, use RDA with a log- or sqrt-transformation

If you want to model relative compositional changes, use CCA or RDA with a hellinger transformation

---

# Overview

Personally, I have most success with PCA/RDA on hellinger-transformed data whenever I was interested in modelling species compositional change

When I have experimental data, gradient lengths are usually smaller and we are often more interested in modelling abundances, in which case RDA with a log-transformation is a good choice as it is closer to regression models I would fit

Rarely will I choose PERMANOVA, dbRDA, NMDS, over PCA/RDA

---

# Overview

For omics data, especially microbiome data (16S sequencing, metagenomics), where you have large variation in the number of reads across samples, we will typically need to treat the data compositionally (relative proportions or %); CCA, tb-RDA (RDA after Hellinger transform, or RDA after ALR or CLR transform)

---

# Overview

If you want to model the abundances (raw PCA, RDA) then you will need to account for the differences in effort between samples:

* *vst*
* *rlog*

If you are OK just modelling dissimilarities, you could use the `avgdist()` function to compute `\(d_{ij}\)` over many sets of rarefied counts, and then use nMDS or dbRDA etc.

---

# Overview

We use Permutation Tests to test effects in ordination models

Often you will need to fit several different models to test the various hypotheses you have for a given data set

Think very carefully about any groupings there are in the data; almost invariably you will need to use a restricted permutation test, especially in an experimental setting

---

# Overview

To figure out what needs to be permuted and what needs to held fixed, it helps to sketch out the design, even if only crudely, using colours of the individuals to indicate the different levels of your treatments &mdash; look at whether the colours vary *within* plots or blocks or *between* plots (and blocks, but if blocks, you can't treat them as blocks, they need to be plots and permuted)