class: inverse, middle, left, my-title-slide, title-slide .title[ # Quantifying trends in ecological data using GAMs ] .author[ ### Gavin Simpson ] .institute[ ### Department of Animal and Veterinary Sciences ] .date[ ### March 8, 2023 ] --- # Two case studies 1. Trends in biodiversity data, and 2. Shifting seasonality in cases of Lyme borreliosis --- class: inverse middle center subsection # Biodiversity trends --- # Biodiversity losses .center[![:scale 75%](./resources/living-planet-index-by-region.svg) ] ??? Many of us will be familiar with trends like this Global biodiversity is in trouble Wherever we look we see declining populations Living Plant Index is an average measure of change (so some time series in a region could be increasing even if overall we have a decline in LPI) --- # Biodiversity losses..? .center[![:scale 100%](./resources/vellend-et-al-2013.png) ] Vellend *et al* (2013) *PNAS* .center[![:scale 100%](./resources/dornelas-et-al-2014.png) ] Dornelas *et al* 2014 *Science* ??? But studies showing declining biodiversity are not without issue There has been a decade or more long discussion about the extent to which diversity is declining --- # Biodiversity losses..? .center[![:scale 100%](./resources/gonzalez-et-al-2016.png) ] Gonzalez *et al* (2016) *Ecology* .center[![:scale 100%](./resources/vellend-et-al-2017.png) ] Vellend *et al* 2017 *Ecology* --- # Biodiversity gains & losses .center[![:scale 75%](./resources/dornelas-et-al-2019.png) ] Dornelas *et al* (2019) *Ecology Letters* In part disagreement relates to *population* versus *assemblage* level metrics Partly due to how the data are analysed --- # Linear trends Majority of these large-scale syntheses use linear trend models .center[![:scale 75%](./resources/dornelas-et-al-2014-fig-2.jpeg) ] Fig 2 from Dornelas *et al* (2014) *Science* `\(y_i\)` sometimes transformed --- # Seibold et al 2019 Analysed 150 grassland and 140 forest sites across 3 regions of Germany 2008–2017 .center[![:scale 75%](./resources/seibold-et-al-2019.png) ] Seibold *et al* (2019) *Nature* ??? As an example of why using linear trends might be problematic, I will use this recent example from Seibold et al --- # Declining arthopod biomass, abundance, and richness .center[![:scale 60%](./resources/seibold-et-al-fig-1.svg) ] Seibold *et al* (2019) *Nature* --- # Declining arthopod biomass, abundance, and richness * Grasslands * Biomass ↓ 67% * Abundance ↓ 78% * Richness ↓ 36% * Forests (30 site subset) * Biomass ↓ 41% * Richness ↓ 36% --- # Good and bad years .center[ ![:scale 60%](./resources/fournier-et-al-2019.svg) ] Fournier *et al* (2019) *Conservation Biology* [doi: 10.1111/cobi.13371](https://doi.org/10.1111/cobi.13371) --- # Good and bad years Daskalova *et al* (2021) attempted to account for this effect using random effects of `Year` .center[ ![:scale 75%](./resources/daskalova-et-al-2021-fig-1.svg) ] Daskalova *et al* (2021) *Insect Conservation & Diversity* --- # Generalized Additive Models Random effects of `Year` are one form of common trend Year-to-year variation about some overall mean -- Random effects are a special case of a penalized spline -- GAMs --- # Generalized Additive Models <br /> ![](resources/tradeoff-slider.png) .references[Source: [GAMs in R by Noam Ross](https://noamross.github.io/gams-in-r-course/)] GAMs are an intermediate-complexity model * can learn from data without needing to be informed by the user * remain interpretable because we can visualize the fitted features * fit smooth functions using penalized splines --- # Splines .center[![](resources/spline-anim.gif)] --- # Splines formed from basis functions ![](index_files/figure-html/basis-functions-1.svg)<!-- --> ??? Splines are built up from basis functions Here I'm showing a cubic regression spline basis with 10 knots/functions We weight each basis function to get a spline. Here all the basis functions have the same weight so they would fit a horizontal line --- # Weight basis functions ⇨ spline .center[![](resources/basis-fun-anim.gif)] ??? But if we choose different weights we get more wiggly spline --- # Maximize penalized log-likelihood ⇨ β .center[![](resources/gam-crs-animation.gif)] ??? Fitting a GAM involves finding the weights for the basis functions that produce a spline that fits the data best, subject to some constraints --- # GAM summary 1. GAMs give us a framework to model flexible nonlinear relationships 2. Use little functions (**basis functions**) to make big functions (**smooths**) 3. Use a **penalty** to trade off wiggliness/generality 4. Need to make sure your smooths are **wiggly enough** --- # Arthropod abundance .center[![:scale 90%](./resources/arthropod-abundance.svg) ] --- # Arthropod abundance Negative binomial GAM fitted to single site (AEG1) .center[![:scale 70%](./resources/arthropod-abundance-site-aeg1-negbin-gam.svg) ] --- # Grassland arthropod abundance Our example: * Abundance of all identified Arthropods * Grassland sites only * Models * Linear global and site-specific trends (random slopes & intercepts) * Smooth trend of `year` plus `site` random intercepts * Regional smooths of `year` plus `site` random intercepts * Regional smooths of `year`, `site` specific *random* smooths * Regional smooths of `year`, year-to-year effects, `site` specific *random* smooths --- # Random smooths? * Random intercepts * Random slopes * Random smooths? Random smooths yield a separate smooth for each subject Share a wiggliness penalty (plus `\(\lambda\)` for intercepts & slopes) But individual trends can have different shape --- # Random smooth model `site` specific *random* smooths .center[![:scale 70%](./resources/arthropod-abundance-m3.svg) ] Pedersen _et al_ (2019) [PeerJ **7**:e6876](https://doi.org/10.7717/peerj.6876) --- # Model A Regional smooths of `year`, `site` specific *random* smooths .center[![:scale 80%](./resources/arthropod-abundance-m4.svg) ] --- # Model B Regional smooths of `year`, year-to-year effects, `site` specific *random* smooths .center[![:scale 80%](./resources/arthropod-abundance-m5.svg) ] --- # Results * Model A (without Y2Y effects): AIC 15122 (255.7 EDF) * Model B (with Y2Y effects): AIC 15123 (254.2 EDF) * Equivalent mixed linear trend model (with Y2Y effects): AIC 15468 (111.8 EDF) Use year-to-year effects or not? -- (For these data) **A** and **B** are different ways to decompose the temporal effects * Without Y2Y → wigglier regional smooths * With Y2Y → less wiggly regional smooths ??? Even with relatively simple data, GAMs yield much better fit than linear equiv. --- # Nothing new under the sun Fewster, Buckland, *et al* (2000) *Ecology* (except Simon Wood's excellent {mgcv} software & ability to fit random smooths) Jonas Knape (2016, *J. App. Ecol.*) --- # Summary How we model trends in biodiversity data can change our view of losses and gains Using "smooth" trends gives much better fit for the arthropod data here Y2Y effects and / or smooth trends represent different decompositions of the *temporal* effect Non-linear methods needed if we want to assess changes in rates of change --- class: inverse subsection center middle # Lyme borreliosis --- # Lyme borreliosis .row[ .col-9[ Lyme disease is a zoonotic infection Caused by bacteria (spirochetes) in the *Borrelia burgdorferi* sensu lato complex Transmitted by infected tick larvae & nymphs in the genus *Ixodes* during blood meals Common vector-borne disease in N hemisphere Reported cases of Lyme borreliosis have increased in recent decades ] .col-3[ .center[ <img src="resources/Adult_deer_tick.jpg" width="1763" /> ] .smaller[Adult deer tick, *Ixodes scapularis* (Source: Scott Bauer) ] ] ] ??? Increased in both number and geographical range --- # Tick-borne disease & climate .center[ <img src="resources/gilbert-tick-zoonoses-schematic.png" width="90%" /> ] .smaller[source: Gilbert (2021) *Annu. Rev. Entomol.* [10.1146/annurev-ento-052720-094533](https://doi.org/10.1146/annurev-ento-052720-094533) ] ??? Figure caption from Gilbert (2021) Schematic diagram showing how climate change can affect ticks directly (by changing oviposition, development, mortality rates, and activity) and indirectly (by changing habitat and host species and abundance). Climate change can, in turn, affect tick-borne pathogen infection risk in humans by directly affecting human behavior (e.g., outdoor recreation) and indirectly affecting pathogen transmission rates and prevalence via hosts and ticks. The relative importance of each pathway is challenging to ascertain. --- # Tick-borne disease & climate Aim of Goren *et al* (2023) was to study 1. the incidence, and 2. the seasonal timing of Lyme borreliosis at the northern limit of the range in Europe --- # Cases of Lyme borreliosis in Norway <img src="index_files/figure-html/lyme-cases-plot-1.svg" width="95%" style="display: block; margin: auto;" /> .small[ Data: Goren *et al* (2023) *Proc. Biol. Sci.* [10.1098/rspb.2022.2420](https://doi.org/10.1098/rspb.2022.2420) ] --- # Climate change in Norway Compared to 1979–2008 reference period Annual mean temperature increased by ~ 0.5℃ Winter temperatures increase by ~1℃ Growing season increased by 1–2 weeks nationally Annual precipitation increased by ~3% per decade --- # GAM for seasonal data Assume weekly cases are distributed Negative binomial $$ \texttt{cases}_i \sim \mathcal{NB}(\mu_i, \theta) $$ Lots of ways to decompose these data; settled on .small[ $$ \log (\mathbb{E}(\texttt{cases}_i) ) = \beta_0 + f(\texttt{week}_i , \texttt{year}_i) + \log(\texttt{population}_i) $$ ] where $$ f(\texttt{week}_i , \texttt{year}_i) $$ is a tensor product smooth (main effects plus interaction) --- # GAM for seasonal data Model code looks like: ```r m <- bam(cases ~ s(week, bs = "cc", k = 20) + s(year, k = 25) + ti(week, year, bs = c("cc", "tp"), k = c(20, 20)) + offset(log(pop)), data = lyme, family = nb(), method = "fREML", knots = list(week = c(0.5, 52.5)), discrete = TRUE, nthreads = 4) ``` Using lots of *mgcv* tricks to fit the model *fast* as some decompositions take a long time to fit with `gam()` Decomposed tensor product into main smooth effects plus a pure smooth interaction --- # Estimated smooth functions <img src="index_files/figure-html/lyme-draw-1.svg" width="95%" style="display: block; margin: auto;" /> --- # Estimated seasonal curves <img src="index_files/figure-html/lyme-plot-fitted-curves-1.svg" width="95%" style="display: block; margin: auto;" /> --- # How has timing of peak changed? Easy — using data in previous figure, find week in each year where `\(\widehat{\texttt{cases}}_{ij}\)` is maximal -- Harder — account for the uncertainty in the model & put an uncertainty band on `\(\widehat{\texttt{peak}}_{ij}\)` -- GAMs are an *empirical Bayesian* model so we can simulate from the posterior distribution of the model to get answers to questions like this --- # How has timing of peak changed? *gratia* 📦 has functions to make doing this easy(-ish) ```r ds2 <- data_slice(m, week = unique(week), year = unique(year), pop = log(100000)) ds2 <- ds2 |> add_column(row = seq_len(nrow(ds2)), .before = 1L) fs2 <- fitted_samples(m, data = ds2, n = 10000, seed = 42, scale = "response") fs2 <- fs2 |> left_join(ds2, by = join_by(row)) |> select(-pop) peak_week <- fs2 |> group_by(year, draw) |> slice_max(order_by = fitted, n = 1) |> group_by(year) |> summarise(peak = quantile(week, prob = 0.5), lower_ci = quantile(week, prob = 0.055), upper_ci = quantile(week, prob = 0.945)) ``` --- # How has timing of peak changed? <img src="index_files/figure-html/lyme-plot-peak-week-1.svg" width="95%" style="display: block; margin: auto;" /> --- # Summary GAMs offer a flexible way to model (decompose) temporal variation in ecological time series Can be adapted in many ways * space * space & time * covariate effects * model distributions or quantiles * handle very large data Posterior inference allows us to do many useful things (easily &mdash, more easily) --- # Slides * © Simpson (2022-2023) [![Creative Commons Licence](https://i.creativecommons.org/l/by/4.0/88x31.png)](http://creativecommons.org/licenses/by/4.0/) * HTML Slide deck [gavinsimpson.github.io/au-ecoscience-2023/](https://gavinsimpson.github.io/au-ecoscience-2023/) * RMarkdown [https://github.com/gavinsimpson/au-ecoscience-2023](https://github.com/gavinsimpson/au-ecoscience-2023) * DOI: 10.5281/zenodo.7706659 [![DOI](https://zenodo.org/badge/610948193.svg)](https://zenodo.org/badge/latestdoi/610948193) * @ucfagls on Twitter --- # Representing trends With the penalty idea we can imagine several ways to account for temporal effects in GAMs 1. Fixed year effects 2. Random year effects 3. MRF equivalent of AR 4. Smooth 5. Linear 6. ... 1 & 5 typically not penalized (but could be via `paraPen`) Could combine these hierarchically