class: inverse, middle, left, my-title-slide, title-slide .title[ # Other topics ] .author[ ### Gavin L. Simpson ] .date[ ### May 30, 2024 ] --- class: inverse middle center big-subsection # Welcome --- # Other topics * Co-correspondence analysis (Co-CA) * Principal Response Curves (PRC) * Mantel test * Ordination diagnostics --- class: inverse middle center big-subsection # Co-CA --- # Co-correspondence analysis We may be interested in relating two species data sets to one another Can can't do this using CCA with one data set as the response and the other as the predictors, because this would be too many predictors We might also not want to treat one data set as the response, we might want to analyse the data symmetrically Co-correspondence Analysis (Co-CA) is a suitable method --- # Co-correspondence analysis There are two types of Co-CA 1. symmetric Co-CA, and 2. predictive Co-CA Symmetric Co-CA finds axes in both data sets which maximise the covariation of the two data sets Predictive Co-CA finds directions in one data set which best predict the response data set --- # Symmetric Co-CA ```r library("cocorresp") data(beetles) ## log transform the beetle data beetles <- log1p(beetles) data(plants) ``` --- # Symmetric Co-CA ```r bp.sym <- coca(beetles ~ ., data = plants, method = "symmetric") ``` ``` ## ## Removed some species that contained no data in: beetles, plants ``` ```r bp.sym ``` ``` ## ## Symmetric Co-Correspondence Analysis ## ## Call: symcoca(y = y, x = x, n.axes = n.axes, R0 = weights, symmetric = ## symmetric, nam.dat = nam.dat) ## ## Inertia: ## Total Explained Residual ## beetles: 3.988 3.971 0.018 ## plants: 5.757 5.740 0.018 ``` --- # Symmetric Co-CA ``` r screeplot(bp.sym) ``` <img src="slides_files/figure-html/unnamed-chunk-1-1.svg" alt="" width="80%" style="display: block; margin: auto;" /> --- # Symmetric Co-CA ```r layout(matrix(1:2, ncol = 2)) biplot(bp.sym, which = "y1", main = "Beetles") biplot(bp.sym, which = "y2", main = "Plants") layout(1) ``` <img src="slides_files/figure-html/plot-symcoca-1.svg" width="80%" style="display: block; margin: auto;" /> --- class: inverse middle center big-subsection # PRC --- # Principal Response Curves PRC is a special form of redundancy analysis (RDA) that is useful when you want to compare the development of a biological community in time, under different conditions (treatments) The different conditions don't have to be experimental; could be used for monitoring relative to a control site or control period --- # Principal Response Curves PRC focuses on the temporally structured effects of different levels of a factor In standard RDA (etc) it is difficult to compare the temporal trajectories of plots as the time direction is usually a complex path through the ordination space PRC is designed to focus specifically on the temporal effects and their differences among sites Allows for a PRC diagram that displays these time-structured effects optimally --- # Principal Response Curves Assume `\(k\)` treatment levels in a factor `\(F\)` We observed community composition (abundance) at the same set of samples at (the same) multiple time points The time points are coded as `\(t\)` PRC fits a *partial* RDA of the form ``` r rda(comm ~ F:t + Condition(t), data = df) ``` As `\(t\)` is partialled out, the PRC represents the overall **differences** among treatment level and how these difference change through time --- # PRC — pyrifos example Data are log transformed abundances of aquatic invertebrate in twelve ditches studied in eleven times before and after an insecticide treatment 12 mesocosms were allocated randomly to treatments, with 4 controls, while the remaining 8 mesocosms were treated with a dose of an insecticide, *chloropyrifos* (0.1, 0.9, 6, 44 μg/ L) Invertebrates were samples 11 times, from 4 weeks *prior* to treatment through 24 weeks post-treatment (132 samples total) .small[ van den Brink, P.J. & ter Braak, C.J.F. (1999). Principal response curves: Analysis of time-dependent multivariate responses of biological community to stress. Environmental Toxicology and Chemistry, 18, 138–148.] --- # PRC — pyrifos example ``` r data(pyrifos) dim(pyrifos) ``` ``` ## [1] 132 178 ``` ``` r ditch <- gl(12, 1, length = 132) week <- gl(11, 12, labels = c(-4, -1, 0.1, 1, 2, 4, 8, 12, 15, 19, 24)) dose <- factor(rep(c(0.1, 0, 0, 0.9, 0, 44, 6, 0.1, 44, 0.9, 0, 6), 11)) ``` Important that the control (or reference) site is the reference (first) level of the "treatment" factor — use `relevel()` --- # PRC — pyrifos example ``` r mod <- prc(pyrifos, dose, week) ``` .small[ ``` r mod ``` ``` ## ## Call: prc(response = pyrifos, treatment = dose, time = week) ## ## Inertia Proportion Rank ## Total 288.9920 1.0000 ## Conditional 63.3493 0.2192 10 ## Constrained 96.6837 0.3346 44 ## Unconstrained 128.9589 0.4462 77 ## ## Inertia is variance ## ## Eigenvalues for constrained axes: ## RDA1 RDA2 RDA3 RDA4 RDA5 RDA6 RDA7 RDA8 RDA9 RDA10 RDA11 ## 25.282 8.297 6.044 4.766 4.148 3.857 3.587 3.334 3.087 2.551 2.466 ## RDA12 RDA13 RDA14 RDA15 RDA16 RDA17 RDA18 RDA19 RDA20 RDA21 RDA22 ## 2.209 2.129 1.941 1.799 1.622 1.579 1.440 1.398 1.284 1.211 1.133 ## RDA23 RDA24 RDA25 RDA26 RDA27 RDA28 RDA29 RDA30 RDA31 RDA32 RDA33 ## 1.001 0.923 0.862 0.788 0.750 0.712 0.685 0.611 0.584 0.537 0.516 ## RDA34 RDA35 RDA36 RDA37 RDA38 RDA39 RDA40 RDA41 RDA42 RDA43 RDA44 ## 0.442 0.417 0.404 0.368 0.340 0.339 0.306 0.279 0.271 0.205 0.179 ## ## Eigenvalues for unconstrained axes: ## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 ## 17.156 9.189 7.585 6.064 5.730 4.843 4.518 4.105 ## (Showing 8 of 77 unconstrained eigenvalues) ``` ] --- # PRC — pyrifos example ``` r ctrl <- how(plots = Plots(strata = ditch,type = "free"), within = Within(type = "series"), nperm = 99) anova(mod, permutations = ctrl, first = TRUE) ``` ``` ## Permutation test for rda under reduced model ## Plots: ditch, plot permutation: free ## Permutation: series ## Number of permutations: 99 ## ## Model: prc(response = pyrifos, treatment = dose, time = week) ## Df Variance F Pr(>F) ## RDA1 1 25.282 15.096 0.01 ** ## Residual 77 128.959 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ``` --- # PRC — pyrifos example ``` r plot(mod, species = FALSE, legpos = "topright") ``` <img src="slides_files/figure-html/unnamed-chunk-7-1.svg" alt="" width="90%" style="display: block; margin: auto;" /> --- # PRC — pyrifos example .small[ ``` r plot(mod, species = FALSE, legpos = "topright") logabu <- colSums(pyrifos) scrs <- scores(mod, display = "species", choices = 1) linestack(scrs[logabu > 150, , drop = FALSE]); axis(side = 2) ``` ] .row[ .col-9[ <!-- --> ] .col-3[ <!-- --> ] ] --- # Mantel test The Mantel test is used to compare two similarity or distance matrices computed for the same set of samples We might compare * ecological distances with * geographical distances E.g. * `vegdist(spp, method = foo)`, & * `vegdist(coords, method = "euclidean")` --- # Mantel test The classical Mantel test involves unrolling one triangle of each distance matrix & computing the crossproduct (sum of the scalar products) of the two vectors .row[ .col-6[ ``` r df <- data.frame(y1 = c(0, 0, 1), y2 = c(4, 1, 0), y3 = c(8, 1, 0)) D <- as.matrix(vegdist(df)) D ``` ``` ## 1 2 3 ## 1 0.0000000 0.7142857 1 ## 2 0.7142857 0.0000000 1 ## 3 1.0000000 1.0000000 0 ``` ``` r D[upper.tri(D)] # unroll ``` ``` ## [1] 0.7142857 1.0000000 1.0000000 ``` ] .col-6[ ``` r crossprod(1:3, 1:3) ``` ``` ## [,1] ## [1,] 14 ``` ``` r sum(1:3 * 1:3) ``` ``` ## [1] 14 ``` ] ] This yields the Matel statistic `\(z_{M}\)` --- # Mantel test Alternatively we might standaridize the two vectors of distances before computing the crossproduct, and then divide the crossproduct by `\((n(n - 1) / 2) - 1\)`, the number of distances in each triangle minus 1 This yields statisic `\(r_{M}\)` `\(r_{M}\)` is bounded between -1 and +1 and behavles like a correlation coefficient Or we might convert the distances to ranks before computing `\(r_{M}\)`, which yields the Spearman correlation of the distance vectors --- # Mantel test Indeed, the Mantel statistic can be seen as the correlation between two distance matrices `\(\mathbf{D_Y}\)` and `\(\mathbf{D_X}\)` Which is what *vegan* computes via `mantel()` using one of the Pearson, Spearman, or Kendall correlations --- # Mantel test However, we can't test this correlation in the usual way: there are `\((n(n-1)) / 2\)` distances from only `\(n\)` objects Mantel developed an asymptotic test using math In *vegan* we test the correlation coefficient via a permutation test We permute the rows _and_ columns of `\(\mathbf{D_X}\)` --- # Mantel test Lichen pastures Is vegetation in the lichen pastures related to the environment? ``` r data(varespec, varechem, package = "vegan") D_veg <- vegdist(varespec) # Bray-Curtis D_env <- vegdist(scale(varechem), "euclidean") mantel(D_veg, D_env, permutations = how(nperm = 999)) ``` ``` ## ## Mantel statistic based on Pearson's product-moment correlation ## ## Call: ## mantel(xdis = D_veg, ydis = D_env, permutations = how(nperm = 999)) ## ## Mantel statistic r: 0.3047 ## Significance: 0.001 ## ## Upper quantiles of permutations (null model): ## 90% 95% 97.5% 99% ## 0.120 0.144 0.173 0.202 ## Permutation: free ## Number of permutations: 999 ``` --- # Mantel test Orabatid mites Does Orabatid mite composition **similarity** decrease with distance? ``` r data(mite, mite.xy, package = "vegan") D_mite <- vegdist(mite) # Bray-Curtis D_xy <- vegdist(mite.xy, "euclidean") mantel(D_mite, D_xy, permutations = how(nperm = 999), method = "spearman") ``` ``` ## ## Mantel statistic based on Spearman's rank correlation rho ## ## Call: ## mantel(xdis = D_mite, ydis = D_xy, method = "spearman", permutations = how(nperm = 999)) ## ## Mantel statistic r: 0.4904 ## Significance: 0.001 ## ## Upper quantiles of permutations (null model): ## 90% 95% 97.5% 99% ## 0.0427 0.0567 0.0659 0.0784 ## Permutation: free ## Number of permutations: 999 ``` ??? Ecological theory sometimes predicts relationships between distance matrices. E.g. in the Neutral Theory, which predicts a monotonic declining relationship between community composition similarity and geographic separation, distance decay of community similarity Yes, the coef is positive with distance and we we used a dissimilarity matrix for composition --- # Partial Mantel test Say we have `\(\mathbf{D}_1\)`, `\(\mathbf{D}_2\)`, and `\(\mathbf{D}_3\)` The _partial_ Mantel statistic `\(r_M(\mathbf{D}_1\mathbf{D}_2 \boldsymbol{\cdot} \mathbf{D})\)` is the correlation between `\(\mathbf{D}_1\)` and `\(\mathbf{D}_2\)`, while controlling for the effects of `\(\mathbf{D}_3\)` In *vegan* this is available via `mantel.partial()` --- # Mantel test The Mantel test is a test of the correlation between two distance matrices It should only be used to test hypotheses about distances, not about raw data The Mantel test between `\(\mathbf{D_Y}\)` and `\(\mathbf{D_X}\)` is not equivalent to 1. a test of the correlation coefficient betwen two vectors of raw data, or 2. a test of a linear regression of `\(\mathbf{y}\)` on matrix `\(\mathbf{X}\)`, or 3. a test in canonial analysis of multivariate `\(\mathbf{Y}\)` on `\(\mathbf{X}\)` --- class: inverse middle center big-subsection # Other stuff --- # Diagnostics for constrained ordinations **vegan** provides a series of diagnostics to help assess the model fit * `goodness()` * `inertcomp()` * `spenvcor()` * `intersetcor()` * `vif.caa()` --- # Diagnostics | goodness of fit `goodness()` computes a goodness of fit statistic for species or sites, controlled by argument `display` Gives the cumulative proportion of variance explained by each axis ```r upr <- cca(varespec ~ ., data = varechem) lwr <- cca(varespec ~ 1, data = varechem) set.seed(1) mods <- ordistep(lwr, scope = formula(upr), trace = 0) head(goodness(mods)) ``` ``` ## CCA1 CCA2 CCA3 ## Callvulg 0.0062471656 0.318907619 0.8254657 ## Empenigr 0.1164701677 0.137604904 0.1953245 ## Rhodtome 0.0999089739 0.169697909 0.1824153 ## Vaccmyrt 0.2361482843 0.240516323 0.2406730 ## Vaccviti 0.1523704591 0.156502301 0.2110550 ## Pinusylv 0.0009244423 0.004802076 0.0060096 ``` --- # Diagnostics | inertia decomposition `inertcomp()` decomposes the variance in samples or species in partial, constrained, and unconstrained components * `statistic = "explained` (default) gives the decomposition in terms of variance * `statistic = "distance"` gives decomposition in terms of the the residual distance ```r head(inertcomp(mods, proportional = TRUE)) ``` ``` ## CCA CA ## Callvulg 0.8254657 0.1745343 ## Empenigr 0.1953245 0.8046755 ## Rhodtome 0.1824153 0.8175847 ## Vaccmyrt 0.2406730 0.7593270 ## Vaccviti 0.2110550 0.7889450 ## Pinusylv 0.0060096 0.9939904 ``` --- # Diagnostics | species-environment correlations `spenvcor()` returns the (weighted) correlation between the weighted average-based and the linear combination-based sets of site scores A *poor* measure of goodness of fit. Sensitive to * outliers (like all correlations) * overfitting (using too many constraints) Better models can have poorer species-environment correlations ```r spenvcor(mods) ``` ``` ## CCA1 CCA2 CCA3 ## 0.8554793 0.8131627 0.8792221 ``` --- # Diagnostics | interset correlations `intersetcor()` returns the (weighted) correlation between the weighted average-based site scores and each constraint variable Another *poor* diagnostic * correlation based * focuses on a single constraint--axis combination at a time ```r intersetcor(mods) ``` ``` ## CCA1 CCA2 CCA3 ## Al 0.7356445 -0.1304293 0.4260453 ## P -0.3588931 -0.6109601 0.4478786 ## K -0.3767902 -0.1339051 0.7759566 ``` Vector fitting (`envfit()`) or biplot scores (`scores(model, display = "bp")`) are better alternatives --- # References .smaller[ * Anderson, M.J., 2001. A new method for non-parametric multivariate analysis of variance. Austral Ecol. 26, 32–46 * Anderson, M.J., 2006. Distance-based tests for homogeneity of multivariate dispersions. Biometrics 62, 245–253 * Anderson, M.J., Walsh, D.C.I., 2013. PERMANOVA, ANOSIM, and the Mantel test in the face of heterogeneous dispersions: What null hypothesis are you testing? Ecol. Monogr. 83, 557–574 * Anderson, M.J., Walsh, D.C.I., Robert Clarke, K., Gorley, R.N., Guerra-Castro, E., 2017. Some solutions to the multivariate Behrens-Fisher problem for dissimilarity-based analyses. Aust. N. Z. J. Stat. 59, 57–79 * Blanchet, F.G., Legendre, P., Borcard, D., 2008. Forward selection of explanatory variables. Ecology 89, 2623–2632 * Legendre, P., Anderson, M.J., 1999. Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecol. Monogr. 69, 1–24 * McArdle, B.H., Anderson, M.J., 2001. Fitting Multivariate Models to Community Data: A Comment on Distance-Based Redundancy Analysis. Ecology 82, 290–297 * Warton, D.I., Wright, S.T., Wang, Y., 2012. Distance-based multivariate analyses confound location and dispersion effects. Methods Ecol. Evol. 3, 89–101 ] --- # Overview For unconstrained ordination, any method will identify and show the main patterns in the species data If you are describing variation in environmental variables use PCA --- # Overview If you have species data, are you interested in absolute counts or in relative composition? If absolute counts, PCA, perhaps with a log- or sqrt-transformation of the counts is a good option If relative composition, CA is a good choice unless you have samples with odd species composition (outliers). PCA with a Hellinger transformation will often work just as well as CA and isn't affected as much by outliers If you must use dissimilarities instead of data, then use NMDS --- # Overview For constrained ordination, use RDA or CCA in preference, over PERMANOVA or db-RDA, unless you really want or need to use a specific dissimilarity metric If you have non-species responses, use RDA If you have species responses and want to model the raw abundances, use RDA with a log- or sqrt-transformation If you want to model relative compositional changes, use CCA or RDA with a hellinger transformation --- # Overview Personally, I have most success with PCA/RDA on hellinger-transformed data whenever I was interested in modelling species compositional change When I have experimental data, gradient lengths are usually smaller and we are often more interested in modelling abundances, in which case RDA with a log-transformation is a good choice as it is closer to regression models I would fit Rarely will I choose PERMANOVA, dbRDA, NMDS, over PCA/RDA --- # Overview For omics data, especially microbiome data (16S sequencing, metagenomics), where you have large variation in the number of reads across samples, we will typically need to treat the data compositionally (relative proportions or %); CCA, tb-RDA (RDA after Hellinger transform, or RDA after ALR or CLR transform) --- # Overview If you want to model the abundances (raw PCA, RDA) then you will need to account for the differences in effort between samples: * *vst* * *rlog* If you are OK just modelling dissimilarities, you could use the `avgdist()` function to compute `\(d_{ij}\)` over many sets of rarefied counts, and then use nMDS or dbRDA etc. --- # Overview We use Permutation Tests to test effects in ordination models Often you will need to fit several different models to test the various hypotheses you have for a given data set Think very carefully about any groupings there are in the data; almost invariably you will need to use a restricted permutation test, especially in an experimental setting --- # Overview To figure out what needs to be permuted and what needs to held fixed, it helps to sketch out the design, even if only crudely, using colours of the individuals to indicate the different levels of your treatments — look at whether the colours vary *within* plots or blocks or *between* plots (and blocks, but if blocks, you can't treat them as blocks, they need to be plots and permuted)