class: inverse, middle, left, my-title-slide, title-slide # Detecting change in palaeoecological time series, old and new ### Gavin Simpson ### Department of Animal Science • Aarhus Universitet ### PaleoEcoGen Seminar • March 24, 2022 --- ### Land acknowledgment * This work was start at University of Regina, on Treaty 4 lands ### Funding .row[ .col-4[ .center[] ] .col-4[ .center[] ] .col-4[ .center[] ] ] ### Data * Peter Leavitt (URegina) for the Lake 227 pigment data * The pigment samples were collected from Lake 227 in the ELA, which is on Treaty 3 lands ### Slides * [bit.ly/paleoecogen-gavin](http://bit.ly/paleoecogen-gavin) © Simpson (2020–2022) [](http://creativecommons.org/licenses/by/4.0/) ??? This research was conducted while I was at the University of Regina, which is situated on the territories of the nay-hi-yuh-wuk (Cree; nêhiyawak), uh-nish-i-naa-payk (Salteaux; Anihšināpēk), Dakota, Lakota, and Nakoda, and the homeland of the Métis/Michif Nation. The fossil pigment data I will talk about later were collected by my colleague at the University of Regina, Dr. Peter Leavitt, at Lake 227 in the Experimental Lakes Area in northern Ontario, which is on Treaty 3 lands, the territory of the Anishinaabe Nation. --- class: inverse middle center big-subsection # Resilience --- # Alternative stable states .row[ .col-9[ Under range of conditions multiple ecosystem states may exist Stability landscapes depict equilibria and basins of attraction Valleys are stable states External conditions alter the stability landscape Disturbances can kick a system between states Variance used as EWS of impending state change ] .col-3[ .center[ <!-- --> ] .smaller[Scheffer et al Nature 2001] ] ] --- # Variance as resilience Intuitive & key descriptor of ecosystem state & community dynamics .row[ .col-6[ .center[ <!-- --> ] ] .col-6[ .center[ <!-- --> ] ] .col-12[ .small[Source: Scheffer *et al* Science (2012)] ] ] ??? The variance is an important measure of ecosystem state & community dynamics Ecologists have linked variance with coexistence of populations and variance can be interpreted in terms of resilience theory Here I'm showing two cartoons; A: in the high resilience state the ecosystem returns rapidly to the equilibrium following perturbation and hence has low variance B: in the low resilience state, where the basin of attraction has become shallower, the ecosystem takes longer to return to equilibrium following perturbation and hence exhibits high variance Variance, however, is much harder to estimate from data --- # Resilience Resilience has (too) many definitions -- > as the time required for an ecosystem to return to an equilibrium or steady-state following a perturbation -- Some ecologists would use that definition for *stability* Buzz Holling referred to this as *engineering resilience* -- If you don't buy any of that: Phenomenologically, community variance is a useful thing to know… -- …? ??? We could (and many people have) spent decades quibbling about what "resilience" is and how to define it From a phenomenological viewpoint, I hope we can agree that the variance of a community tells us something about the dynamics of that community and change in variance tells us something about changes in those dynamics over time — and that is something useful --- # How do ecologists compute variance? Using metrics -- .bigger[ 🤬 ] -- Ecologists get a number and do things with it Typically brute force inference -- Statisticians want to know the properties of this novel estimator Typically brute forcing inference is fraught with problems --- # Moving windows .row[ .col-6[ Moving window approach: 1. detrend 2. choose window size 3. estimate variance 4. move window 1 time step 5. repeat 3. & 4. until done 6. look for trend ] .col-6[ .center[ <!-- --> ] .small[Modified from Dakos *et al* (2012) PLOS One] ] ] ??? One way to estimate the variance of a series is by using a moving window Here the analyst first detrends the time series Choose an appropriate window size Starting from the beginning of the series calculate the variance of the observations in the window Move the window along one time step and repeat 3 and 4 until you get to the end of the series A trend in the resulting variance time series is estimated using Kendall's rank correlation tau In this example the interest was in the end of the series so they right-aligned the window meaning they had to observe half the series before they could calculate the variance --- # Problems *Ad hoc* * How to detrend (method, complexity of trend, ...) * What window width? Ideally regularly spaced data * What to do about gaps, missing data, irregularly spaced data? Statistical testing hard * Kendall's `\(\tau\)` assumes independence * Surrogate time series assumes regular sampling, sensitive to choice of ARMA ??? The whole approach is *ad hoc* with many knobs to twiddle — how do you detrend the series? what width of window should be used? Suited to regularly spaced data so you have the same number of observations in each window, but what about series with data gaps, missing observations, or data that are irregularly spaced? Statistical inference is very hard; Kendall's tau assumes the data are independence but they can't be because of the moving window. Sometime surrogate time series are used to assess significance of the trend in variance; Surrogate time series are series generated with known properties that don't have a change in variance, but these approaches rely on classical techniques like ARMA models which only work for regularly spaced data and the test is sensitive to choice of orders in the ARMA --- class: inverse middle center big-subsection # Distributional Models --- class: inverse middle center big-subsection # Mean ??? The mean When we analyze data we mostly focus on this property of our data With summary statistics we often talk about the average of a set of values Or in a statistical model we might estimate expected change in the response for some change in one or more covariates Or we may be interested in estimating the trend in a data series where the trend described how the expected value of y changes over time --- class: inverse middle center big-subsection # Variance ??? There are other important properties of data however and in the short case study I want to talk about the variance & how we can use statistical models to investigate changes in the variance of ecosystems over time --- # Parameters beyond the mean <!-- --> ??? To do this we'll need models for the variance of a data set If we think of the Gaussian distribution that distribution has two parameters, the mean and the variance In linear regression we model the mean of the response at different values of the covariates, and assume the variance is constant at a value estimated from the residuals In the left panel I'm showing how the Gaussian distribution changes as we alter the mean while keeping the variance fixed, while in the right panel I keep the mean fixed but vary the variance — the parameters are independent --- # Distributional models .medium[ `$$y_{i} | \boldsymbol{x}_i \boldsymbol{z}_i \sim \mathcal{D}(\vartheta_{1}(\boldsymbol{x}_i), \ldots, \vartheta_{K}(\boldsymbol{x}_i))$$` ] For the Gaussian distribution * `\(\vartheta_{1}(\boldsymbol{x}_i) = \mu(\boldsymbol{x}_i)\)` * `\(\vartheta_{2}(\boldsymbol{z}_i) = \sigma(\boldsymbol{z}_i)\)` Provide a list of formulas, one per parameter ```r m <- gam( list( y ~ s(time), # <- formula for mean ~ s(time) ), # <- formula for variance data = df, method = "REML", family = gaulss()) ``` ??? Instead of treating the variance as a nuisance parameter, we could model both the variance **and** the mean as functions of the covariates This is done using what is called a *distributional model* In this model we say that the response values y_i, given the values of one or more covariates x_i follow some distribution with parameters theta, which are themselves functions of one or more covariates For the Gaussian distribution theta 1 would be the mean and theta 2 the variance (or standard deviation) We do not need to restrict ourselves to the Gaussian distribution however --- # The Gamma distribution <!-- --> ??? The gamma distribution would be more appropriate for describing the pigment data as it allows for positive continuous values, where those values are skewed, with some large values Here I'm showing different types of gamma distribution that vary in the mean and their variance and we should note that unlike the Gaussian distribution, we cannot change the mean without also changing the variance of the gamma distribution --- class: inverse middle center big-subsection # Simulated data --- # Simulated data Well-studied model of harvesting and overexploitation Resource biomass `\(x\)` grows logistically and is harvested according to `$$dx = \left(r x \left(1 - \frac{x}{K} \right) - c \frac{x^2}{x^2 + h^2} \right) dt + \sigma x dW$$` .row[ .col-6[ * `\(r\)` is growth rate * `\(K\)` carrying capacity * `\(h\)` is the half-saturation constant ] .col-6[ * `\(c\)` grazing rate * `\(dW\)` is a white noise process with intensity `\((\sigma x)^2 / dt\)` ] ] Deterministic case as `\(c\)` crosses a threshold `\(c \approx 2.604\)` we have a critical transition --- # Simulated data <!-- --> --- # Distributional GAMs Fit three distributional GAMs 1. A basic GAM with constant variance 2. A Gaussian distributional GAM with `$$\vartheta_{\mu} = \beta_0 + f(\mathtt{time})$$` `$$\vartheta_{\sigma} = \beta_0 + f(\mathtt{time})$$` 3. A Gamma distributional GAM with `$$\vartheta_{\mu} = \beta_0 + f(\mathtt{time})$$` `$$\vartheta_{\alpha} = \beta_0 + f(\mathtt{time})$$` --- # Generalized Additive Models <br />  .references[Source: [GAMs in R by Noam Ross](https://noamross.github.io/gams-in-r-course/)] ??? GAMs are an intermediate-complexity model * can learn from data without needing to be informed by the user * remain interpretable because we can visualize the fitted features --- class: inverse middle center large-subsection # GAMs fit wiggly functions --- class: inverse background-image: url('resources/wiggly-things.png') background-size: contain ??? --- # Wiggly things .center[] ??? GAMs use splines to represent the non-linear relationships between covariates, here `x`, and the response variable on the `y` axis. --- class: inverse background-image: url('./resources/rob-potter-398564.jpg') background-size: contain # GAMs are not magical .footnote[ <a style="background-color:black;color:white;text-decoration:none;padding:4px 6px;font-family:-apple-system, BlinkMacSystemFont, "San Francisco", "Helvetica Neue", Helvetica, Ubuntu, Roboto, Noto, "Segoe UI", Arial, sans-serif;font-size:12px;font-weight:bold;line-height:1.2;display:inline-block;border-radius:3px;" href="https://unsplash.com/@robpotter?utm_medium=referral&utm_campaign=photographer-credit&utm_content=creditBadge" target="_blank" rel="noopener noreferrer" title="Download free do whatever you want high-resolution photos from Rob Potter"><span style="display:inline-block;padding:2px 3px;"><svg xmlns="http://www.w3.org/2000/svg" style="height:12px;width:auto;position:relative;vertical-align:middle;top:-1px;fill:white;" viewBox="0 0 32 32"><title></title><path d="M20.8 18.1c0 2.7-2.2 4.8-4.8 4.8s-4.8-2.1-4.8-4.8c0-2.7 2.2-4.8 4.8-4.8 2.7.1 4.8 2.2 4.8 4.8zm11.2-7.4v14.9c0 2.3-1.9 4.3-4.3 4.3h-23.4c-2.4 0-4.3-1.9-4.3-4.3v-15c0-2.3 1.9-4.3 4.3-4.3h3.7l.8-2.3c.4-1.1 1.7-2 2.9-2h8.6c1.2 0 2.5.9 2.9 2l.8 2.4h3.7c2.4 0 4.3 1.9 4.3 4.3zm-8.6 7.5c0-4.1-3.3-7.5-7.5-7.5-4.1 0-7.5 3.4-7.5 7.5s3.3 7.5 7.5 7.5c4.2-.1 7.5-3.4 7.5-7.5z"></path></svg></span><span style="display:inline-block;padding:2px 3px;">Rob Potter</span></a> ] --- class: inverse middle center large-subsection # Basis Expansions --- # Splines formed from basis functions <!-- --> ??? Splines are built up from basis functions Here I'm showing a cubic regression spline basis with 10 knots/functions We weight each basis function to get a spline. Here all the basisi functions have the same weight so they would fit a horizontal line --- # Weight basis functions ⇨ spline .center[] ??? But if we choose different weights we get more wiggly spline Each of the splines I showed you earlier are all generated from the same basis functions but using different weights --- # How do GAMs learn from data? <!-- --> ??? How does this help us learn from data? Here I'm showing a simulated data set, where the data are drawn from the orange functions, with noise. We want to learn the orange function from the data --- # Maximise penalised log-likelihood ⇨ β .center[] ??? Fitting a GAM involves finding the weights for the basis functions that produce a spline that fits the data best, subject to some constraints --- class: inverse middle center subsection # Use a wiggliness penalty — avoid over fitting --- # Recap Fit three distributional GAMs 1. A basic GAM with constant variance 2. A Gaussian distributional GAM with `$$\vartheta_{\mu} = \beta_0 + f(\mathtt{time})$$` `$$\vartheta_{\sigma} = \beta_0 + f(\mathtt{time})$$` 3. A Gamma distributional GAM with `$$\vartheta_{\mu} = \beta_0 + f(\mathtt{time})$$` `$$\vartheta_{\alpha} = \beta_0 + f(\mathtt{time})$$` --- # Results Gaussian model had lowest AIC .center[ <!-- --> ] --- class: inverse middle center big-subsection # It works --- class: inverse middle center big-subsection # 😅 --- class: inverse middle center big-subsection # Real Data --- class: inverse middle center big-subsection # Lake 227 ??? Today I want to illustrate how we can use modern statistical models to continuously estimate the variance of multivariate time series using data from Lake 227 in the Experimental Lakes Area, Canada --- # Lake 227 .row[ .col-6[ * Experimentally manipulated * Annual sediment samples 1943–1990 * Analyzed for fossil pigments ] ] Cottingham *et al* (2000) **Ecology Letters** showed via a Levene's test that **variances** pre- & post-intervention were different ??? The lake was experimentally manipulated to investigate responses to increased nutrients The lake is annually laminated and Peter Leavitt, URegina, measured the sub-fossil pigment concentrations for each year between 1943 and 1990 Kathy Cottingham, Jim Rusak, and Peter Leavitt previously analyzed these data by separating them into control and treated sections and compared the variances of the two periods with a Levene's test, equivalent of a t-test but for differences of variances not means They showed that the algal community was more variable in the treated period than in the pre-manipulation period --- # Lake 227 <!-- --> ??? Today I will focus only on the main algal groups: * diatoms & chrysophytes through Fucoxanthin, * cryptophytes through Alloxanthin, * cyanobacteria and chlorophytes through Lutein-Zeaxanthin & pheophytin b, and * total algae through beta carotene --- # Lake 227 sedimentary pigments Fitted gamma distributional model using {brms} 📦 and **Stan** `\begin{align*} y_t & \sim \mathcal{G}(\mu = \eta_{1,t}, \alpha = \eta_{2,t}) \\ \eta_{1,t} & = \exp(f_{1,\text{pigment}}(\text{Year}_t)) \\ \eta_{2,t} & = \exp(f_{2,\text{pigment}}(\text{Year}_t)) \end{align*}` Mean `\(\mu\)` and shape `\(\alpha\)` each modelled with "random effect" spline using `bs = "fs"` Could also fit this using {mgcv} 📦 and `gammals()` or `twlss()` family if no censored pigments ??? To estimate how the mean and the variance of the pigment data changed during the experimental manipulation, I fitted a gamma distributional model using the brms package and the Stan Bayesian software We assume the responses are conditionally distributed gamma, and we model the mean and the shape parameter of the distribution with linear predictors Each linear predictor includes a smooth function of Year for each pigment After fitting we calculate the variance at each time point using the estimated values of the mean and the shape parameters using the posterior distribution of the response to generate credible intervals --- # Results — pigment means <!-- --> ??? This plot shows the estimated mean concentration for the five pigments estimated from the model, and which shows the increase in algal abundance during the experimental manipulation --- # Results — pigment variances <!-- --> ??? In this plot are showing the estimated variance throughout the time series We clearly see the increased variance in the algal communities that Cottingham et al observed, but now we can provide a continuous estimate of the variance over time rather than having to split the data into two and compare the variance of the two time periods --- # Summary * Variance is a useful measure of community dynamics — maybe resilience too * Avoid *ad hoc* approaches if we can — use a statistical model * Distributional models model all parameters of a conditional distribution * We can use distributional GAMs to model the variance of time series * We can do useful inference with these models 1. Have the full conditional distribution — skewness etc 2. Have the posterior distribution — makes inference easy --- # Where next Need to combine variance estimates for individual taxa into a community measure of variance Can we use these kinds of model to estimate values that are required in theoretical approaches to ecological stability etc? --- # But… 1 Most palaeo data aren't so nicely behaved Each slice of mud contains a different number of "lake years" The greater the number of "lake years" the lower the variance Implies a trend of increasing variance Model this *effect* — include amount of time in each slice (`lake_years`) in the `\(\vartheta_{\sigma}\)` or `\(\vartheta_{\alpha}\)` See Simpson (2018) for a discussion on this --- # But… 2 Each eDNA sample represents a different amount effort spent counting things — number of reads This too(!) implies heterogeneity in the data Don't rarefy your eDNA counts (e.g. McMurdie & Holmes 2014) Use an offset (term with constant `\(\beta_{\text{offset}} = 1\)`) — `offset(log(n_reads))` Can do better — estimate size factors `\(s\)` from all samples c.f. *DESeq2* Extra-poisson variance is the norm — use a negative binomial or **Tweedie** response --- class: inverse middle center big-subsection # Multivariate data --- ## Tradition * Typically proportional composition * Dimension reductions via ordination * Clustering * Dissimilarity * Approximations --- class: inverse background-image: url('resources/anne-nygard-yYRAvcPrWms-unsplash.jpg') background-size: cover # Maslow's hammer .footnote[source: Anne Nygard] --- # It's OK to follow tradition The traditional methods haven't stopped working -- But at some point they may hold us back --- class: inverse middle center big-subsection # Topic models --- # Complex multivariate species data .center[  ] .footnote[ Data provided by Jeffery Stone (Indiana State University) ] --- # Dimension reduction Typically we can't model all 100+ taxa in data sets like this - (M)ARSS-like models don't like the large `\(n\)` Seek a reduced dimensionality of the data that preserves the signal Existing dimension reduction methods aren't appropriate for questions we want to ask - Interpretation of latent factors is complex (PCA, CA, Principal Curves) - Linear combination of all taxa Can we group species into `\(J\)` *associations* and soft cluster samples as compositions of these associations? Twinspan? --- # Foy Lake — Montana .center[  ] --- # Foy Lake — Montana .center[  ] --- # Topic models Machine learning approach for organizing text documents - **Latent Dirichlet Allocation (LDA)** — (Blei, Ng, & Jordan, *J. Mach. Learn. Res.* 2003) *Generative* model for word occurences in documents - Valle, Baiser, Woodall, & Chazdon, R. (2014) *Ecology Letters* **17** - Christensen, Harris, & Ernest. (2018) *Ecology* doi:10.1002/ecy.2373 --- # Community of Skittles .center[  ] --- # Individual skittles from one of four flavoured packs .center[  ] What are the proportion of flavours in each pack? How many of each pack comprise the skittle community? --- # Latent Dirichlet Allocation Aim is to infer the - distribution of skittle **flavours** within each pack `\(\beta_j\)`, and - distribution of skittle **packs** within each community (*sample*) Achieve a soft clustering of samples — mixed membership model Achieve a soft clustering of **species** (flavours) into **associations** of taxa (packs) User supplies `\(J\)` — the number of associations *a priori* — `\(j = \{1, 2, \dots, J\}\)` `\(J\)` can be chosen using AIC, *perplexity*, CV, ... --- # Intuition behind LDA Latent Dirichlet Allocation represents a trade-off between two goals 1. for each sample, allocate its individuals to a **few associations of species** 2. in each association, assign high probability to a **few species** --- # Intuition behind LDA These are in opposition * assigning a sample to a single association makes *2* **hard** — all its species must have high probability under that one topic * putting very few species in each association makes *1* **hard** — to cover all individuals in a sample must assign sample to many associations Trading off these two goals therefore results in LDA finding tightly co-occurring species --- # Latent Dirichlet Allocation 1. Flavour distribution for `\(j\)`th type of Skittle `\(\beta_j \sim \text{Dirichlet}(\delta)\)` 2. Proportions of each type in the Skittle community `\(\theta \sim \text{Dirichlet}(\alpha)\)` 3. For each skittle `\(s_i\)` - Choose a pack in proportion `$$z_i \sim \text{Multinomial}(\theta)$$` - Choose a flavour from chosen pack with probability `$$p(s_i | z_i, \beta_j) \sim \text{Multinomial}(\delta)$$` --- # Correlated Topic Model LDA assumes associations of species are **uncorrelated** Potentially more *parsimonious* & *realistic* if associations were correlated Proportions of each type in the Skittle community — draw `$$\eta \sim N(\mu, \Sigma)$$` Then transform `\(\eta_J\)` to proportional scale `\(\Sigma\)` controls the correlation between *associations* Blei & Lafferty (2007) *Ann. Appl. Stat.* **1**, 17–35. --- # Correlated Topic Model .center[  ] --- # Displaying results Traditionally large corpus — *interactive* exploration of results - plot topic proportions over time - ordinatate topic proportions - `\(\text{PCA}_{\text{Hellinger}}\)`, CA, etc --- # Latent Dirichlet Allocation — Association 1 .center[  ] --- # Latent Dirichlet Allocation — Association 2 .center[  ] --- # Latent Dirichlet Allocation — Association 3 .center[  ] --- # Latent Dirichlet Allocation — Association 4 .center[  ] --- # Latent Dirichlet Allocation — Trend estimation LDA knows nothing about the temporal ordering of the samples A generative model for the data Treat the topic proportions as data into some other model — *explanation* Estimate trends in proportions of species associations using Dirichlet regression --- # Dirichlet regression Dirichlet distribution describes multivariate proportions on a *simplex* % sand, silt, clay Only need to know two of these for complete information — total is 100% Closed compositional data Multinomial is for *counts* from a total count Dirichlet is for **true** proportions --- # Dirichlet regression Model for the proportions `\(\alpha\)` (or other parametrisation) .center[  ] --- # CTM — Trend (Dirichlet Regression) .center[  ] --- # Summary LDA & CTM proved well-capable of summarizing the complex community dynamics of Foy Lake - Reduced 113 taxa to 10 associations of species - Species associations match closely the expert interpretation of the record - make autecological sense also - The CTM was more parsimonious — removed one rare association - Estimated trends in proportions of species associations capture mixture of - smooth, slowly varying trends, and - rapid (regime shift?) state change ~ 1.3 ky BP --- # Future directions Choosing `\(J\)` is inconvenient Address this via **Hierarchical Dirichlet Processes** - assume `\(J\)` is infinite & put a prior distribution over `\(J\)` Associations in LDA & CTM are static — distributions are fixed for all samples - dynamic & structural topic models allow distributions to vary smoothly with time Many developments in this field: - *Chinese Restaurant Process*, - *Indian Buffet Process*, & - ... --- # Related topics - *Structural topic model* — include covariates in the topic model - proportions of each species in the data set (corpus) - proportions of each association in samples - allow the proportions to vary over time - *dynamic topic model* is a special case - Relationships with ecological theory - Harris *et al* (2015). Linking Statistical and Ecological Theory: Hubbell’s Unified Neutral Theory of Biodiversity as a Hierarchical Dirichlet Process. *Proc. IEEE PP*, 1–14. <https://doi.org/10.1109/JPROC.2015.2428213> - In large samples Hubbell's UNTB `\(\rightarrow\)` HDP --- class: inverse middle center big-subsection # Thank you --- # Selected references [McMurdie, P.J., Holmes, S., 2014. PLoS Comput. Biol. 10, e1003531.](https://doi.org/10.1371/journal.pcbi.1003531) [Simpson, G.L., 2018. Frontiers in Ecology and Evolution 6, 149.](https://doi.org/10.3389/fevo.2018.00149)