Detecting change in a dynamic world

class: inverse, middle, left, my-title-slide, title-slide

.title[
# Detecting change in a dynamic world
]
.author[
### Gavin Simpson
]
.institute[
### Department of Animal and Veterinary Sciences · Aarhus University
]
.date[
### November 1, 2022
]

---

### Land acknowledgment

* This work was done at URegina on Treaty 4 lands

### Funding

.row[

.col-6[
.center[![:scale 70%](./resources/NSERC_C.svg)]
]

.col-6[
.center[![:scale 70%](./resources/fgsr-logo.jpg)]
]

]

### Data

* Arctic Sea Ice from [National Snow & Ice Data Center](https://nsidc.org/arcticseaicenews/)
* Lake ice data from [Global Lake and River Ice Phenology Database](http://nsidc.org/data/G01377.html)
* Lake ice study joint work with Stefano Mezzini

* bit.ly/ou-simpson-2022

<!-- ### Slides

* HTML Slide deck [bit.ly/hifmb-talk](http://bit.ly/hifmb-talk) &copy; Simpson (2020) [![Creative Commons Licence](https://i.creativecommons.org/l/by/4.0/88x31.png)](http://creativecommons.org/licenses/by/4.0/) -->

???

This research was conducted while I was at the University of Regina, which is situated on the territories of the nay-hi-yuh-wuk (Cree; nêhiyawak), uh-nish-i-naa-payk (Salteaux; Anihšināpēk), Dakota, Lakota, and Nakoda, and the homeland of the Métis/Michif Nation.

---
class: inverse
background-image: url('./resources/franki-chamaki-z4H9MYmWIMA-unsplash.jpg')
background-size: cover

.footnote[
<a style="background-color:black;color:white;text-decoration:none;padding:4px 6px;font-family:-apple-system, BlinkMacSystemFont, &quot;San Francisco&quot;, &quot;Helvetica Neue&quot;, Helvetica, Ubuntu, Roboto, Noto, &quot;Segoe UI&quot;, Arial, sans-serif;font-size:12px;font-weight:bold;line-height:1.2;display:inline-block;border-radius:3px" href="https://unsplash.com/@franki?utm_medium=referral&amp;utm_campaign=photographer-credit&amp;utm_content=creditBadge" target="_blank" rel="noopener noreferrer" title="Download free do whatever you want high-resolution photos from Franki Chamaki"><span style="display:inline-block;padding:2px 3px"><svg xmlns="http://www.w3.org/2000/svg" style="height:12px;width:auto;position:relative;vertical-align:middle;top:-2px;fill:white" viewBox="0 0 32 32"><title>unsplash-logo</title><path d="M10 9V0h12v9H10zm12 5h10v18H0V14h10v9h12v-9z"></path></svg></span><span style="display:inline-block;padding:2px 3px">Franki Chamaki</span></a>
]

???

We learn from data because it can highlight our preconceptions and biases

---

# Learning from data

![](index_files/figure-html/lm-plot-1.svg)

???

Learning from data could be a simple as fitting a linear regression model...

---
class: inverse
background-image: url('./resources/deep-learning-turned-up-to-11.jpg')
background-size: contain

???

Or as complex as fitting a sophisticated multi-layered neural network trained on huge datasets or corpora

---

# Learning involves trade-offs

.row[
.col-6[
.center[![:scale 75%](resources/low-bias-low-variance.jpg)]
]

.col-6[
.center[![:scale 75%](resources/interpretable-black-box.jpg)]
]
]

???

Learning from data involves trade offs

We can have models that fit our data well &mdash; low bias &mdash; but which are highly variable, or

We can fit models that have lower variance, but these tend to have higher bias, i.e. fit the data less well

A linear regression model is very interpretable but unless the underlying relationship is linear it will have poor fit

Deep learning may fit data incredibly well but the model is very difficult to interpret and understand

---

# Motivating example

![](index_files/figure-html/load-sea-ice-1.svg)

---

# Generalized Additive Models

<br />

![](resources/tradeoff-slider.png)

.references[Source: [GAMs in R by Noam Ross](https://noamross.github.io/gams-in-r-course/)]

???

GAMs are an intermediate-complexity model

* can learn from data without needing to be informed by the user
* remain interpretable because we can visualize the fitted features

---

# HadCRUT4 time series

![](index_files/figure-html/hadcrut-temp-example-1.svg)

---

# Polynomials

![](index_files/figure-html/hadcrut-temp-polynomial-1.svg)

---

# How is a GAM different?

In GLM we model the mean of data as a sum of linear terms:

`$$g(\mu_i) = \beta_0 + \sum_j \color{red}{ \beta_j x_{ji}}$$`

A GAM is a sum of _smooth functions_ or _smooths_

`$$g(\mu_i) = \beta_0 + \sum_j \color{red}{s_j(x_{ji})}$$`

where `$y_i \sim \mathcal{D}(\mu_i, \theta)$`

---
class: inverse middle center subsection

# GAMs use splines

---
class: inverse
background-image: url('resources/wiggly-things.png')
background-size: contain

???

---

# Splines formed from basis functions

![](index_files/figure-html/basis-functions-1.svg)

???

Splines are built up from basis functions

Here I'm showing a cubic regression spline basis with 10 knots/functions

We weight each basis function to get a spline. Here all the basis functions have the same weight so they would fit a horizontal line

---

# Weight basis functions &#8680; spline

.center[![](resources/basis-fun-anim.gif)]

???

But if we choose different weights we get more wiggly spline

Each of the splines I showed you earlier are all generated from the same basis functions but using different weights

---

# How do GAMs learn from data?

![](index_files/figure-html/example-data-figure-1.svg)

???

How does this help us learn from data?

Here I'm showing a simulated data set, where the data are drawn from the orange functions, with noise. We want to learn the orange function from the data

---

# Maximise penalised log-likelihood &#8680; &beta;

.center[![](resources/gam-crs-animation.gif)]

???

Fitting a GAM involves finding the weights for the basis functions that produce a spline that fits the data best, subject to some constraints

---
class: inverse middle center subsection

# Avoid overfitting our sample

---
class: inverse middle center subsection

# Use a wiggliness penalty &mdash; avoid fitting too wiggly models

---

# Fitted GAM

![](index_files/figure-html/hadcrutemp-fitted-gam-1.svg)

---
class: inverse middle center subsection

# Why GAMs?

---

# Why GAMs?

Originally motivated by messy irregularly-spaced time series data

.center[
<img src="resources/frontiers-paper-title.png" width="993" />
]

Simpson (2018, Frontiers in Ecology & Evolution)

---

# Why GAMs?

Originally motivated by messy irregularly-spaced time series data

.row[

.col-6[
* sediment cores

* samples irregularly spaced in time
]

.col-6[
<img src="resources/tom_core.jpg" width="75%" style="display: block; margin: auto;" />
]
]

---

# Why GAMs?

Originally motivated by messy irregularly-spaced time series data

Interest in the "trends" themselves not (necessarily) the temporal dependence

(Relatively) Easy to fit & &mdash; user friendly tools

Familiar

---

# GAM summary

1. GAMs give us a framework to model flexible nonlinear relationships

2. GAMs are familiar as regression models

3. Use little functions (**basis functions**) to make big functions (**smooths**)

4. Use a **penalty** to trade off wiggliness/generality

5. Need to make sure your smooths are **wiggly enough**

---
class: inverse middle center subsection

# Arctic sea-ice

---

---

# Arctic sea-ice

`$$\texttt{extent}_i \sim \mathcal{G}(\mu_i, \sigma^2)$$`

`$$\mu_i = f(\texttt{year}_i, \texttt{day}_i) + \gamma_{j(\texttt{year}_i)}$$`

```r
m <- bam(extent ~ s(year, k = 20, bs = "cr") +
                 s(day_of_year, bs = "cc", k = 30) +
                 ti(year, day_of_year, bs = c("cr", "cc"), k = c(6,6)) +
                 s(yearf, bs = "re"),
             data = sea_ice, method = "fREML",
             knots = knots, discrete = TRUE, nthreads = 4, control = ctrl_bam)
```
---

# Long term trend

![](index_files/figure-html/unnamed-chunk-2-1.svg)

---

# Estimated rate of change

![](index_files/figure-html/unnamed-chunk-3-1.svg)

---

# Estimate extent day 250

![](index_files/figure-html/unnamed-chunk-4-1.svg)

---
class: inverse middle center subsection

# Insectogeddon

---

# Insectogeddon

.row[
.col-6[
Insect populations are in decline

All populations?

Where?

By how much?

]

.col-6[
.center[![:scale 60%](./resources/un-biodiversity-insect-tweeted-img.png)
]
]
]
.small[
Thomas, C.D., Jones, T.H., Hartley, S.E., 2019. “Insectageddon”: A call for more robust data and rigorous analyses. Glob. Chang. Biol. 25, 1891–1892. https://doi.org/10.1111/gcb.14608
]

---

# Seibold et al 2019

Analysed 150 grassland and 140 forest sites across 3 regions of Germany 2008&ndash;2017

.center[![:scale 75%](./resources/seibold-et-al-2019.png)
]

Seibold *et al* (2019) *Nature*

---

# Declining arthopod biomass, abundance, and richness

.center[![:scale 60%](./resources/seibold-et-al-fig-1.svg)
]

Seibold *et al* (2019) *Nature*

---

# Declining arthopod biomass, abundance, and richness

* Grasslands

* Biomass &#8595; 67%

* Abundance &#8595; 78%

* Richness &#8595; 36%

* Forests (30 site subset)

* Biomass &#8595; 41%

* Richness &#8595; 36%

---

# Good and bad years

.center[
![:scale 60%](./resources/fournier-et-al-2019.svg)
]

Fournier *et al* (2019) *Conservation Biology* [doi: 10.1111/cobi.13371](https://doi.org/10.1111/cobi.13371)

---

# Good and bad years

Daskalova *et al* (2021) attempted to account for this effect using random effects of `Year`

.center[
![:scale 75%](./resources/daskalova-et-al-2021-fig-1.svg)
]

Daskalova *et al* (2021) *Insect Conservation & Diversity*

---

# Generalized Additive Models

Random effects of `Year` are one form of common trend

Random effects are a special case of a penalized spline

---

# Arthropod abundance

Negative binomial GAM fitted to single site (AEG1)

.center[![:scale 70%](./resources/arthropod-abundance-site-aeg1-negbin-gam.svg)
]

---

# Grassland arthropod abundance

Our example:

* Abundance of all identified Arthropods

* Grassland sites only

* Models

* Linear global and site-specific trends (random slopes & intercepts)
    * Smooth trend of `year` plus `site` random intercepts
    * Regional smooths of `year` plus `site` random intercepts
    * Regional smooths of `year`, `site` specific *random* smooths
    * Regional smooths of `year`, year-to-year effects, `site` specific *random* smooths

---

# Random smooths?

* Random intercepts
* Random slopes
* Random smooths?

Random smooths yield a separate smooth for each subject

Share a wiggliness penalty (plus `$\lambda$` for intercepts & slopes)

But individual trends can have different shape

---

# Random smooth model

`site` specific *random* smooths

.center[![:scale 70%](./resources/arthropod-abundance-m3.svg)
]

Pedersen _et al_ (2019) [PeerJ **7**:e6876](https://doi.org/10.7717/peerj.6876)

---

# Model A

Regional smooths of `year`, `site` specific *random* smooths

.center[![:scale 80%](./resources/arthropod-abundance-m4.svg)
]

---

# Model B

Regional smooths of `year`, year-to-year effects, `site` specific *random* smooths

.center[![:scale 80%](./resources/arthropod-abundance-m5.svg)
]

---

# Results

* Model A (without Y2Y effects): AIC 15122 (255.7 EDF)
* Model B (with Y2Y effects): AIC 15123 (254.2 EDF)
* Equivalent mixed linear trend model (with Y2Y effects): AIC 15468 (111.8 EDF)

Use year-to-year effects or not?

(For these data) **A** and **B** are different ways to decompose the temporal effects

* Without Y2Y &#8594;	wigglier regional smooths
* With Y2Y &#8594; less wiggly regional smooths

???

Even with relatively simple data, GAMs yield much better fit than linear equiv.

---

# Nothing new under the sun

Fewster, Buckland, *et al* (2000) *Ecology*

(except Simon Wood's excellent {mgcv} software & ability to fit random smooths)

---

# Summary

How we model trends in biodiversity data can change our view of losses and gains

Using "smooth" trends gives much better fit for the arthropod data here

Y2Y effects and / or smooth trends represent different decompositions of the *temporal* effect

Non/linear methods needed if we want to assess changes in rates of change

---
class: inverse middle center subsection

#  Lake ice phenology

---

# Lake ice phenology

.center[
<img src="resources/animated-map.gif" width="550" height="550" />
]

---

# Problems

.center[
<img src="resources/dates.svg" width="550" height="550" />
]

---

# Time-to-event data &mdash; Event history models

Follow-up times `->` Survival analysis model

Piece-wise exponential model is essentially a Poisson GLM on transformed data &mdash; hazard rate is piece-wise constant within time intervals

Piece-wise exponential additive mixed model &mdash, PAMM

Estimate hazard rate as a smooth function rather than a step function

Can include all of *mgcv* 📦 tools for splines & effects

*pammtools* 📦 (Bender, Groll, Scheipl, 2018 *Statistical Modelling*)

---

# PAMMs

.center[
![](resources/mendota-freeze-viz.svg)
]

---

# Results

.center[
<img src="resources/p-freeze-thaw.svg" width="700" />
]

---

# Results

.center[
<img src="resources/hpam-spatial-freeze.svg" width="550" height="550" />
]

---

# Results

.center[
<img src="resources/hpam-change-in-dates.svg" width="550" height="550" />
]

---

# Summary

GAMs & penalized splines are a good choice for modelling messy spatio-temporal
data

Models are Bayesian (empirical or fully Bayes) &mdash; posterior

Penalised splines are in a wide range of software

* *mgcv*, *brms*, *bamlss*, *BayesX*, *JAGS*

Can handle relatively large data: `mgcv::bam()`

Tools for evaluating fitted models

* *mgcViz*, *gratia*

---
class: inverse middle center subsection

# Extra slides

---

# Wiggliness

`$$\int_{\mathbb{R}} [f^{\prime\prime}]^2 dx = \boldsymbol{\beta}^{\mathsf{T}}\mathbf{S}\boldsymbol{\beta} = \large{W}$$`

`$\mathbf{S}$` is a *penalty* matrix

`$\boldsymbol{\beta}^{\mathsf{T}}\mathbf{S}\boldsymbol{\beta}$` is a *quadratic* penalty

---

# Making wiggliness matter

`$W$` measures **wiggliness**

(log) likelihood measures closeness to the data

We use a **smoothing parameter** `$\lambda$` to define the trade-off, to find the spline coefficients `$\boldsymbol{\beta}$` (and `$\boldsymbol{\gamma}$`) that maximize the **penalized** log-likelihood

`$$\mathcal{L}_p = \log(\text{Likelihood})  - \lambda W$$`

---

# HadCRUT4 time series

![](index_files/figure-html/hadcrut-temp-penalty-1.svg)

---

# Picking the right wiggliness

.pull-left[
Two ways to think about how to optimize `$\lambda$`:

* Predictive: Minimize out-of-sample error
* Bayesian:  Put priors on our basis coefficients
]

.pull-right[
Many methods: AIC, Mallow's `$C_p$`, GCV, ML, REML

* **Practically**, use **REML**, because of numerical stability
* Hence `gam(..., method='REML')`
]

.center[
![Animation of derivatives](./resources/remlgcv.png)
]

---

# Maximum allowed wiggliness

We set **basis complexity** or "size" `$k$`

This is _maximum wigglyness_, can be thought of as number of small functions that make up a curve

Once smoothing is applied, curves have fewer **effective degrees of freedom (EDF)**

EDF < `$k$`

---

# Maximum allowed wiggliness

`$k$` must be *large enough*, the `$\lambda$` penalty does the rest

*Large enough* &mdash; space of functions representable by the basis includes the true function or a close approximation to the tru function

Bigger `$k$` increases computational cost

In **mgcv**, default `$k$` values are arbitrary &mdash; after choosing the model terms, this is the key user choice

**Must be checked!** &mdash; `gam.check()`

---

# Knots?

Can avoid the knot placement problem using low-rank thin plate regression splines

---

# Thin plate splines
Default spline in {mgcv} is `bs = "tp"`

Thin plate splines &mdash; knot at every unique data value

Penalty on wiggliness (default is 2nd derivative)

As many basis functions as data

This is wasteful

---

# Low rank thin plate splines

Thin plate regression spline

Form the full TPS basis

Take the wiggly basis functions and eigen-decompose it

Concentrates the *information* in the TPS basis into as few as possible new basis functions

Retain the `k` eigenvectors with the `k` largest eigenvalues as the new basis

Fit the GAM with those new basis functions

---

# TPRS basis & penalty

.center[
<img src="index_files/figure-html/unnamed-chunk-5-1.png" width="100%" />
]

---

# CRS basis & penalty

.center[
<img src="index_files/figure-html/unnamed-chunk-6-1.png" width="100%" />
]