Introduction to Generalized Additive Models with R and mgcv

class: inverse, middle, left, my-title-slide, title-slide

.title[
# Introduction to Generalized Additive Models with R and mgcv
]
.author[
### Gavin Simpson
]
.date[
### June 7th, 2022
]

---

class: inverse middle center big-subsection

# Welcome

---
class: inverse middle center subsection

# Motivating example

---

# HadCRUT4 time series

![](index_files/figure-html/hadcrut-temp-example-1.svg)

???

Hadley Centre NH temperature record ensemble

How would you model the trend in these data?

---

# Linear Models

`$$y_i \sim \mathcal{N}(\mu_i, \sigma^2)$$`

`$$\mu_i = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \cdots + \beta_j x_{ji}$$`

Assumptions

1. linear effects of covariates are good approximation of the true effects
2. conditional on the values of covariates, `$y_i | \mathbf{X} \sim \mathcal{N}(0, \sigma^2)$`
3. this implies all observations have the same *variance*
4. `$y_i | \mathbf{X}$` are *independent*

An **additive** model address the first of these

---
class: inverse center middle subsection

# Why bother with anything more complex?

---

# Is this linear?

![](index_files/figure-html/hadcrut-temp-example-1.svg)

---

# Polynomials perhaps&hellip;

![](index_files/figure-html/hadcrut-temp-polynomial-1.svg)

---

# Polynomials perhaps&hellip;

We can keep on adding ever more powers of `$\boldsymbol{x}$` to the model &mdash; model selection problem

**Runge phenomenon** &mdash; oscillations at the edges of an interval &mdash; means simply moving to higher-order polynomials doesn't always improve accuracy

---
class: inverse middle center subsection

# GAMs offer a solution

---

# HadCRUT data set

```r
library('readr')
library('dplyr')
URL <-  "https://bit.ly/hadcrutv4"
gtemp <- read_table(URL, col_types = 'nnnnnnnnnnnn', col_names = FALSE) %>%
    select(num_range('X', 1:2)) %>% setNames(nm = c('Year', 'Temperature'))
```

[File format](https://www.metoffice.gov.uk/hadobs/hadcrut4/data/current/series_format.html)

---

# HadCRUT data set

```r
gtemp
```

```
## # A tibble: 172 × 2
##     Year Temperature
##    <dbl>       <dbl>
##  1  1850      -0.336
##  2  1851      -0.159
##  3  1852      -0.107
##  4  1853      -0.177
##  5  1854      -0.071
##  6  1855      -0.19 
##  7  1856      -0.378
##  8  1857      -0.405
##  9  1858      -0.4  
## 10  1859      -0.215
## # … with 162 more rows
```

---

# Fitting a GAM

```r
library('mgcv')
m <- gam(Temperature ~ s(Year), data = gtemp, method = 'REML')
summary(m)
```

.smaller[

```
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## Temperature ~ s(Year)
## 
## Parametric coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept) -0.020477   0.009731  -2.104   0.0369 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##           edf Ref.df     F p-value    
## s(Year) 7.837   8.65 145.1  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.88   Deviance explained = 88.6%
## -REML = -91.237  Scale est. = 0.016287  n = 172
```
]

---

# Fitted GAM

![](index_files/figure-html/hadcrtemp-plot-gam-1.svg)

---
class: inverse middle center big-subsection

# GAMs

---

# Generalized Additive Models

<br />

![](resources/tradeoff-slider.png)

.references[Source: [GAMs in R by Noam Ross](https://noamross.github.io/gams-in-r-course/)]

???

GAMs are an intermediate-complexity model

* can learn from data without needing to be informed by the user
* remain interpretable because we can visualize the fitted features

---

# How is a GAM different?

In LM we model the mean of data as a sum of linear terms:

`$$y_i = \beta_0 +\sum_j \color{red}{ \beta_j x_{ji}} +\epsilon_i$$`

A GAM is a sum of _smooth functions_ or _smooths_

`$$y_i = \beta_0 + \sum_j \color{red}{s_j(x_{ji})} + \epsilon_i$$`

where `$\epsilon_i \sim N(0, \sigma^2)$`, `$y_i \sim \text{Normal}$` (for now)

Call the above equation the **linear predictor** in both cases

---

# Fitting a GAM in R

```r
model <- gam(y ~ s(x1) + s(x2) + te(x3, x4), # formuala describing model
             data = my_data_frame,           # your data
             method = 'REML',                # or 'ML'
             family = gaussian)              # or something more exotic
```

`s()` terms are smooths of one or more variables

`te()` terms are the smooth equivalent of *main effects + interactions*

---

# How did `gam()` *know*?

![](index_files/figure-html/hadcrtemp-plot-gam-1.svg)

---
class: inverse
background-image: url('./resources/rob-potter-398564.jpg')
background-size: contain

# What magic is this?

.footnote[
<a style="background-color:black;color:white;text-decoration:none;padding:4px 6px;font-family:-apple-system, BlinkMacSystemFont, &quot;San Francisco&quot;, &quot;Helvetica Neue&quot;, Helvetica, Ubuntu, Roboto, Noto, &quot;Segoe UI&quot;, Arial, sans-serif;font-size:12px;font-weight:bold;line-height:1.2;display:inline-block;border-radius:3px;" href="https://unsplash.com/@robpotter?utm_medium=referral&amp;utm_campaign=photographer-credit&amp;utm_content=creditBadge" target="_blank" rel="noopener noreferrer" title="Download free do whatever you want high-resolution photos from Rob Potter"><span style="display:inline-block;padding:2px 3px;"><svg xmlns="http://www.w3.org/2000/svg" style="height:12px;width:auto;position:relative;vertical-align:middle;top:-1px;fill:white;" viewBox="0 0 32 32"><title></title><path d="M20.8 18.1c0 2.7-2.2 4.8-4.8 4.8s-4.8-2.1-4.8-4.8c0-2.7 2.2-4.8 4.8-4.8 2.7.1 4.8 2.2 4.8 4.8zm11.2-7.4v14.9c0 2.3-1.9 4.3-4.3 4.3h-23.4c-2.4 0-4.3-1.9-4.3-4.3v-15c0-2.3 1.9-4.3 4.3-4.3h3.7l.8-2.3c.4-1.1 1.7-2 2.9-2h8.6c1.2 0 2.5.9 2.9 2l.8 2.4h3.7c2.4 0 4.3 1.9 4.3 4.3zm-8.6 7.5c0-4.1-3.3-7.5-7.5-7.5-4.1 0-7.5 3.4-7.5 7.5s3.3 7.5 7.5 7.5c4.2-.1 7.5-3.4 7.5-7.5z"></path></svg></span><span style="display:inline-block;padding:2px 3px;">Rob Potter</span></a>
]

---
class: inverse
background-image: url('resources/wiggly-things.png')
background-size: contain

???

---

# Wiggly things

.center[![](resources/spline-anim.gif)]

???

GAMs use splines to represent the non-linear relationships between covariates, here `x`, and the response variable on the `y` axis.

---

# Basis expansions

In the polynomial models we used a polynomial basis expansion of `$\boldsymbol{x}$`

* `$\boldsymbol{x}^0 = \boldsymbol{1}$` &mdash; the model constant term
* `$\boldsymbol{x}^1 = \boldsymbol{x}$` &mdash; linear term
* `$\boldsymbol{x}^2$`
* `$\boldsymbol{x}^3$`
* &hellip;

---

# Splines

Splines are *functions* composed of simpler functions

Simpler functions are *basis functions* & the set of basis functions is a *basis*

When we model using splines, each basis function `$b_k$` has a coefficient `$\beta_k$`

Resultant spline is a the sum of these weighted basis functions, evaluated at the values of `$x$`

`$$s(x) = \sum_{k = 1}^K \beta_k b_k(x)$$`

---

# Splines formed from basis functions

![](index_files/figure-html/basis-functions-1.svg)

???

Splines are built up from basis functions

Here I'm showing a cubic regression spline basis with 10 knots/functions

We weight each basis function to get a spline. Here all the basisi functions have the same weight so they would fit a horizontal line

---

# Weight basis functions &#8680; spline

.center[![](resources/basis-fun-anim.gif)]

???

But if we choose different weights we get more wiggly spline

Each of the splines I showed you earlier are all generated from the same basis functions but using different weights

---

# How do GAMs learn from data?

![](index_files/figure-html/example-data-figure-1.svg)

???

How does this help us learn from data?

Here I'm showing a simulated data set, where the data are drawn from the orange functions, with noise. We want to learn the orange function from the data

---

# Maximise penalised log-likelihood &#8680; &beta;

.center[![](resources/gam-crs-animation.gif)]

???

Fitting a GAM involves finding the weights for the basis functions that produce a spline that fits the data best, subject to some constraints

---
class: inverse middle center subsection

# Avoid overfitting our sample

---
class: inverse center middle large-subsection

# How wiggly?

$$
\int_{\mathbb{R}} [f^{\prime\prime}]^2 dx = \boldsymbol{\beta}^{\mathsf{T}}\mathbf{S}\boldsymbol{\beta}
$$

---
class: inverse center middle large-subsection

# Penalised fit

$$
\mathcal{L}_p(\boldsymbol{\beta}) = \mathcal{L}(\boldsymbol{\beta}) - \frac{1}{2} \lambda\boldsymbol{\beta}^{\mathsf{T}}\mathbf{S}\boldsymbol{\beta}
$$

---

# Wiggliness

`$$\int_{\mathbb{R}} [f^{\prime\prime}]^2 dx = \boldsymbol{\beta}^{\mathsf{T}}\mathbf{S}\boldsymbol{\beta} = \large{W}$$`

(Wiggliness is 100% the right mathy word)

We penalize wiggliness to avoid overfitting

---

# Making wiggliness matter

`$W$` measures **wiggliness**

(log) likelihood measures closeness to the data

We use a **smoothing parameter** `$\lambda$` to define the trade-off, to find
the spline coefficients `$B_k$` that maximize the **penalized** log-likelihood

`$$\mathcal{L}_p = \log(\text{Likelihood})  - \lambda W$$`

---

# HadCRUT4 time series

![](index_files/figure-html/hadcrut-temp-penalty-1.svg)

---

# Picking the right wiggliness

.pull-left[
Two ways to think about how to optimize `$\lambda$`:

* Predictive: Minimize out-of-sample error
* Bayesian:  Put priors on our basis coefficients
]

.pull-right[
Many methods: AIC, Mallow's `$C_p$`, GCV, ML, REML

* **Practically**, use **REML**, because of numerical stability
* Hence `gam(..., method='REML')`
]

.center[
![Animation of derivatives](./resources/remlgcv.png)
]

---

# Maximum allowed wiggliness

We set **basis complexity** or "size" `$k$`

This is _maximum wigglyness_, can be thought of as number of small functions that make up a curve

Once smoothing is applied, curves have fewer **effective degrees of freedom (EDF)**

EDF < `$k$`

---

# Maximum allowed wiggliness

`$k$` must be *large enough*, the `$\lambda$` penalty does the rest

*Large enough* &mdash; space of functions representable by the basis includes the true function or a close approximation to the tru function

Bigger `$k$` increases computational cost

In **mgcv**, default `$k$` values are arbitrary &mdash; after choosing the model terms, this is the key user choice

**Must be checked!** &mdash; `gam.check()`

---

# GAM summary so far

1. GAMs give us a framework to model  flexible nonlinear relationships

2. Use little functions (**basis functions**) to make big functions (**smooths**)

3. Use a **penalty** to trade off wiggliness/generality

4. Need to make sure your smooths are **wiggly enough**

---
class: middle center inverse subsection

# Example

---

# Portugese larks

.row[
.col-7[

```r
library('gamair')
data(bird)

bird <- transform(bird,
            crestlark = factor(crestlark),
            linnet = factor(linnet),
            e = x / 1000,
            n = y / 1000)
head(bird)
```

```
##       QUADRICULA TET crestlark linnet      x       y   e    n
## 13705       NG56   E      <NA>   <NA> 551000 4669000 551 4669
## 13710       NG56   J      <NA>   <NA> 553000 4669000 553 4669
## 13715       NG56   P      <NA>   <NA> 555000 4669000 555 4669
## 13720       NG56   U      <NA>   <NA> 557000 4669000 557 4669
## 13725       NG56   Z      <NA>   <NA> 559000 4669000 559 4669
## 13880       NG66   E      <NA>   <NA> 561000 4669000 561 4669
```
]

.col-5[

![](index_files/figure-html/birds-2-1.svg)
]
]

---
 
# Portugese larks &mdash; binomial GAM

.row[
.col-6[

```r
crest <- gam(crestlark ~ s(e, n, k = 100),
             data = bird,
             family = binomial,
             method = 'REML')
```

`$s(e, n)$` indicated by `s(e, n)` in the formula

Isotropic thin plate spline

`k` sets size of basis dimension; upper limit on EDF

Smoothness parameters estimated via REML
]

.col-6[
.smaller[

```r
summary(crest)
```

```
## 
## Family: binomial 
## Link function: logit 
## 
## Formula:
## crestlark ~ s(e, n, k = 100)
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.24184    0.07785   -28.8   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##          edf Ref.df Chi.sq p-value    
## s(e,n) 74.04  86.46  857.3  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.234   Deviance explained = 25.9%
## -REML = 2499.8  Scale est. = 1         n = 6457
```
]
]
]

---

# Portugese larks &mdash; binomial GAM

Model checking with binary data is a pain &mdash; residuals look weird

Alternatively we can aggregate data at the `QUADRICULA` level & fit a binomial count model

.smaller[

```r
## convert back to numeric
bird <- transform(bird,
                  crestlark = as.numeric(as.character(crestlark)),
                  linnet = as.numeric(as.character(linnet)))
## some variables to help aggregation
bird <- transform(bird, tet.n = rep(1, nrow(bird)),
                  N = rep(1, nrow(bird)), stringsAsFactors = FALSE)
## set to NA if not surveyed
bird$N[is.na(as.vector(bird$crestlark))] <- NA
## aggregate
bird2 <- aggregate(data.matrix(bird), by = list(bird$QUADRICULA),
                   FUN = sum, na.rm = TRUE)
## scale by Quads aggregated
bird2 <- transform(bird2, e = e / tet.n, n = n / tet.n)

## fit binomial GAM
crest2 <- gam(cbind(crestlark, N - crestlark) ~ s(e, n, k = 100),
              data = bird2, family = binomial, method = 'REML')
```
]

---

# Model checking

.pull-left[
.smaller[

```r
crest3 <- gam(cbind(crestlark, N - crestlark) ~
                  s(e, n, k = 100),
              data = bird2, family = quasibinomial,
              method = 'REML')
```
]

Model residuals don't look too bad

Bands of points due to integers

Some overdispersion &mdash; &phi; = 1.73
]

.pull-right[

```r
ggplot(data.frame(Fitted = fitted(crest2),
                  Resid = resid(crest2)),
       aes(Fitted, Resid)) + geom_point() 
```

![](index_files/figure-html/gam-check-aggregated-lark-1.svg)
]

---
class: inverse middle center subsection

# A cornucopia of smooths

---

# A cornucopia of smooths

The type of smoother is controlled by the `bs` argument (think *basis*)

The default is a low-rank thin plate spline `bs = 'tp'`

Many others available

.row[
.col-6[

* Cubic splines `bs = 'cr'`
* P splines `bs = 'ps'`
* Cyclic splines `bs = 'cc'` or `bs = 'cp'`
* Adaptive splines `bs = 'ad'`
* Random effect `bs = 're'`
* Factor smooths `bs = 'fs'`
]
.col-6[

* Duchon splines `bs = 'ds'`
* Spline on the sphere `bs = 'sos'`
* MRFs `bs = 'mrf'`
* Soap-film smooth `bs = 'so'`
* Gaussian process `bs = 'gp'`
]
]

---

# A bestiary of conditional distributions

A GAM is just a fancy GLM

Simon Wood & colleagues (2016) have extended the *mgcv* methods to some non-exponential family distributions

.row[
.col-6[
* `binomial()`
* `poisson()`
* `Gamma()`
* `inverse.gaussian()`
* `nb()`
* `tw()`
* `mvn()`
* `multinom()`
]
.col-6[
* `betar()`
* `scat()`
* `gaulss()`
* `ziplss()`
* `twlss()`
* `cox.ph()`
* `gamals()`
* `ocat()`
]
]

---

# Smooth interactions

Two ways to fit smooth interactions

1. Bivariate (or higher order) thin plate splines
    * `s(x, z, bs = 'tp')`
    * Isotropic; single smoothness parameter for the smooth
	* Sensitive to scales of `x` and `z`
2. Tensor product smooths
    * Separate marginal basis for each smooth, separate smoothness parameters
	* Invariant to scales of `x` and `z`
	* Use for interactions when variables are in different units
	* `te(x, z)`

---

# Tensor product smooths

There are multiple ways to build tensor products in *mgcv*

1. `te(x, z)`
2. `t2(x, z)`
3. `s(x) + s(z) + ti(x, z)`

`te()` is the most general form but not usable in `gamm4::gamm4()` or *brms*

`t2()` is an alternative implementation that does work in `gamm4::gamm4()` or *brms*

`ti()` fits pure smooth interactions; where the main effects of `x` and `z` have been removed from the basis

---

# Factor smooth interactions

Two ways for factor smooth interactions

1. `by` variable smooths
    * entirely separate smooth function for each level of the factor
	* each has it's own smoothness parameter
	* centred (no group means) so include factor as a fixed effect
	* `y ~ f + s(x, by = f)`
2. `bs = 'fs'` basis
    * smooth function for each level of the function
	* share a common smoothness parameter
	* fully penalized; include group means
	* closer to random effects
	* `y ~ s(x, f, bs = 'fs')`

---

# Random effects

When fitted with REML or ML, smooths can be viewed as just fancy random effects

Inverse is true too; random effects can be viewed as smooths

If you have simple random effects you can fit those in `gam()` and `bam()` without needing the more complex GAMM functions `gamm()` or `gamm4::gamm4()`

These two models are equivalent

```r
m_nlme <- lme(travel ~ 1, data = Rail, ~ 1 | Rail, method = "REML")

m_gam  <- gam(travel ~ s(Rail, bs = "re"), data = Rail, method = "REML")
```

---

# Random effects

The random effect basis `bs = 're'` is not as computationally efficient as *nlme* or *lme4* for fitting

* complex random effects terms, or
* random effects with many levels

Instead see `gamm()` and `gamm4::gamm4()`

* `gamm()` fits using `lme()`
* `gamm4::gamm4()` fits using `lmer()` or `glmer()`

For non Gaussian models use `gamm4::gamm4()`

---
class: inverse center middle subsection

# Model checking

---

# Model checking

So you have a GAM:

- How do you know you have the right degrees of freedom? `gam.check()`

- Diagnosing model issues: `gam.check()` part 2

---

# GAMs are models too

How accurate your predictions will be depends on how good the model is

![](index_files/figure-html/misspecify-1.svg)

---
class: inverse center middle subsection

# How do we test how well our model fits?

---

# Simulated data

```r
set.seed(2)
n <- 400
x1 <- rnorm(n)
x2 <- rnorm(n)
y_val <- 1 + 2*cos(pi*x1) + 2/(1+exp(-5*(x2)))
y_norm <- y_val + rnorm(n, 0, 0.5)
y_negbinom <- rnbinom(n, mu = exp(y_val),size=10)
y_binom <- rbinom(n,1,prob = exp(y_val)/(1+exp(y_val)))
```

---

# Simulated data

![](index_files/figure-html/sims_plot-1.svg)

---
class: inverse middle center subsection

# gam.check() part 1: do you have the right functional form?

---

# How well does the model fit?

- Many choices: k, family, type of smoother, &hellip;

- How do we assess how well our model fits?

---

# Basis size *k*

- Set `k` per term

- e.g. `s(x, k=10)` or `s(x, y, k=100)`

- Penalty removes "extra" wigglyness
    
	- *up to a point!*

- (But computation is slower with bigger `k`)

---

# Checking basis size

```r
norm_model_1 <- gam(y_norm~s(x1, k = 4) + s(x2, k = 4), method = 'REML')
gam.check(norm_model_1)
```

```
## 
## Method: REML   Optimizer: outer newton
## full convergence after 8 iterations.
## Gradient range [-0.0003467788,0.0005154578]
## (score 736.9402 & scale 2.252304).
## Hessian positive definite, eigenvalue range [0.000346021,198.5041].
## Model rank =  7 / 7 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##         k'  edf k-index p-value    
## s(x1) 3.00 1.00    0.13  <2e-16 ***
## s(x2) 3.00 2.91    1.04    0.83    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

# Checking basis size

```r
norm_model_2 <- gam(y_norm ~ s(x1, k = 12) + s(x2, k = 4), method = 'REML')
gam.check(norm_model_2)
```

```
## 
## Method: REML   Optimizer: outer newton
## full convergence after 11 iterations.
## Gradient range [-5.658609e-06,5.392657e-06]
## (score 345.3111 & scale 0.2706205).
## Hessian positive definite, eigenvalue range [0.967727,198.6299].
## Model rank =  15 / 15 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##          k'   edf k-index p-value   
## s(x1) 11.00 10.84    0.99    0.38   
## s(x2)  3.00  2.98    0.86    0.01 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

---

# Checking basis size

```r
norm_model_3 <- gam(y_norm ~ s(x1, k = 12) + s(x2, k = 12),method = 'REML')
gam.check(norm_model_3)
```

```
## 
## Method: REML   Optimizer: outer newton
## full convergence after 8 iterations.
## Gradient range [-1.136192e-08,6.794565e-13]
## (score 334.2084 & scale 0.2485446).
## Hessian positive definite, eigenvalue range [2.812271,198.6868].
## Model rank =  23 / 23 
## 
## Basis dimension (k) checking results. Low p-value (k-index<1) may
## indicate that k is too low, especially if edf is close to k'.
## 
##          k'   edf k-index p-value
## s(x1) 11.00 10.85    0.98    0.31
## s(x2) 11.00  7.95    0.95    0.15
```

---

# Checking basis size

![](index_files/figure-html/gam_check_norm4-1.svg)

---
class: inverse middle center subsection

# Model diagnostics

---
class: inverse middle center subsection

# Using gam.check() part 2: visual checks

---

# gam.check() plots

`gam.check()` creates 4 plots:

1. Quantile-quantile plots of residuals. If the model is right, should follow 1-1 line

2. Histogram of residuals

3. Residuals vs. linear predictor

4. Observed vs. fitted values

`gam.check()` uses deviance residuals by default

---

# Gaussian data, Gaussian model

```r
norm_model <- gam(y_norm ~ s(x1, k=12) + s(x2, k=12), method = 'REML')
gam.check(norm_model, rep = 500)
```

![](index_files/figure-html/gam_check_plots1-1.svg)

---

# Negative binomial data, Poisson model

```r
pois_model <- gam(y_negbinom ~ s(x1, k=12) + s(x2, k=12), family=poisson, method= 'REML')
gam.check(pois_model, rep = 500)
```

![](index_files/figure-html/gam_check_plots2-1.svg)

---

# NB data, NB model

```r
negbin_model <- gam(y_negbinom ~ s(x1, k=12) + s(x2, k=12), family = nb, method = 'REML')
gam.check(negbin_model, rep = 500)
```

![](index_files/figure-html/gam_check_plots3-1.svg)

---

# NB data, NB model

```r
appraise(negbin_model, method = 'simulate')
```

![](index_files/figure-html/appraise-gam-check-example-1.svg)

---
class: inverse center middle subsection

# *p* values for smooths

---

# *p* values for smooths

*p* values for smooths are approximate:

1. they don't account for the estimation of `$\lambda_j$` &mdash; treated as known, hence *p* values are biased low

2. rely on asymptotic behaviour &mdash; they tend towards being right as sample size tends to `$\infty$`

---

# *p* values for smooths

...are a test of **zero-effect** of a smooth term

Default *p* values rely on theory of Nychka (1988) and Marra & Wood (2012) for confidence interval coverage

If the Bayesian CI have good across-the-function properties, Wood (2013a) showed that the *p* values have

- almost the correct null distribution

- reasonable power

Test statistic is a form of `$\chi^2$` statistic, but with complicated degrees of freedom

---

# *p* values for unpenalized smooths

The results of Nychka (1988) and Marra & Wood (2012) break down if smooth terms are unpenalized

This include i.i.d. Gaussian random effects, (e.g. `bs = "re"`)

Wood (2013b) proposed instead a test based on a likelihood ratio statistic:

- the reference distribution used is appropriate for testing a `$\mathrm{H}_0$` on the boundary of the allowed parameter space...

- ...in other words, it corrects for a `$\mathrm{H}_0$` that a variance term is zero

---

# *p* values for smooths

Have the best behaviour when smoothness selection is done using **ML**, then **REML**.

Neither of these are the default, so remember to use `method = "ML"` or `method = "REML"` as appropriate

---
class: inverse middle center subsection

# Example

---

# Galveston Bay

.row[

.col-6[
Cross Validated question

> I have a dataset of water temperature measurements taken from a large waterbody at irregular intervals over a period of decades. (Galveston Bay, TX if you’re interested)

<https://stats.stackexchange.com/q/244042/1390>

]

.col-6[

.center[
<img src="resources/cross-validated.png" width="1223" />
]

]
]

---

# Galveston Bay

```r
galveston <- read_csv(here('data', 'gbtemp.csv')) %>%
    mutate(datetime = as.POSIXct(paste(DATE, TIME), format = '%m/%d/%y %H:%M', tz = "CDT"),
           STATION_ID = factor(STATION_ID), DoY = as.numeric(format(datetime, format = '%j')),
           ToD = as.numeric(format(datetime, format = '%H')) +
               (as.numeric(format(datetime, format = '%M')) / 60))
galveston
```

```
## # A tibble: 15,276 × 13
##    STATION_ID DATE     TIME   LATITUDE LONGITUDE  YEAR MONTH   DAY SEASON
##    <fct>      <chr>    <time>    <dbl>     <dbl> <dbl> <dbl> <dbl> <chr> 
##  1 13296      6/20/91  11:04      29.5     -94.8  1991     6    20 Summer
##  2 13296      3/17/92  09:30      29.5     -94.8  1992     3    17 Spring
##  3 13296      9/23/91  11:24      29.5     -94.8  1991     9    23 Fall  
##  4 13296      9/23/91  11:24      29.5     -94.8  1991     9    23 Fall  
##  5 13296      6/20/91  11:04      29.5     -94.8  1991     6    20 Summer
##  6 13296      12/17/91 10:15      29.5     -94.8  1991    12    17 Winter
##  7 13296      6/29/92  11:17      29.5     -94.8  1992     6    29 Summer
##  8 13305      3/24/87  11:53      29.6     -95.0  1987     3    24 Spring
##  9 13305      4/2/87   13:39      29.6     -95.0  1987     4     2 Spring
## 10 13305      8/11/87  15:25      29.6     -95.0  1987     8    11 Summer
## # … with 15,266 more rows, and 4 more variables: MEASUREMENT <dbl>,
## #   datetime <dttm>, DoY <dbl>, ToD <dbl>
```

---

# Galveston Bay model description

$$
`\begin{align}
  \begin{split}
      \mathrm{E}(y_i) & = \alpha + f_1(\text{ToD}_i) + f_2(\text{DoY}_i) + f_3(\text{Year}_i) + f_4(\text{x}_i, \text{y}_i) + \\
        & \quad f_5(\text{DoY}_i, \text{Year}_i) + f_6(\text{x}_i, \text{y}_i, \text{ToD}_i) + \\
        & \quad f_7(\text{x}_i, \text{y}_i, \text{DoY}_i) + f_8(\text{x}_i, \text{y}_i, \text{Year}_i)
  \end{split}
\end{align}`
$$

* `$\alpha$` is the model intercept,
* `$f_1(\text{ToD}_i)$` is a smooth function of time of day,
* `$f_2(\text{DoY}_i)$` is a smooth function of day of year ,
* `$f_3(\text{Year}_i)$` is a smooth function of year,
* `$f_4(\text{x}_i, \text{y}_i)$` is a 2D smooth of longitude and latitude,

---

# Galveston Bay model description

* `$f_5(\text{DoY}_i, \text{Year}_i)$` is a tensor product smooth of day of year and year,
* `$f_6(\text{x}_i, \text{y}_i, \text{ToD}_i)$` tensor product smooth of location & time of day
* `$f_7(\text{x}_i, \text{y}_i, \text{DoY}_i)$` tensor product smooth of location day of year& 
* `$f_8(\text{x}_i, \text{y}_i, \text{Year}_i$` tensor product smooth of location & year

---

# Galveston Bay model description

Effectively, the first four smooths are the main effects of

1. time of day,
2. season,
3. long-term trend,
4. spatial variation

---

# Galveston Bay model description

whilst the remaining tensor product smooths model smooth interactions between the stated covariates, which model

5. how the seasonal pattern of temperature varies over time,
6. how the time of day effect varies spatially,
7. how the seasonal effect varies spatially, and
8. how the long-term trend varies spatially

---

# Galveston Bay &mdash; full model

```r
knots <- list(DoY = c(0.5, 366.5))
m <- bam(MEASUREMENT ~
             s(ToD, k = 10) +
             s(DoY, k = 12, bs = 'cc') +
             s(YEAR, k = 30) +
             s(LONGITUDE, LATITUDE, k = 100, bs = 'ds', m = c(1, 0.5)) +
             ti(DoY, YEAR, bs = c('cc', 'tp'), k = c(12, 15)) +
             ti(LONGITUDE, LATITUDE, ToD, d = c(2,1), bs = c('ds','tp'),
                m = list(c(1, 0.5), NA), k = c(20, 10)) +
             ti(LONGITUDE, LATITUDE, DoY, d = c(2,1), bs = c('ds','cc'),
                m = list(c(1, 0.5), NA), k = c(25, 12)) +
             ti(LONGITUDE, LATITUDE, YEAR, d = c(2,1), bs = c('ds','tp'),
                m = list(c(1, 0.5), NA), k = c(25, 15)),
         data = galveston, method = 'fREML', knots = knots,
         nthreads = c(4, 1), discrete = TRUE)
```

---

# Galveston Bay &mdash; simpler model

```r
m.sub <- bam(MEASUREMENT ~
             s(ToD, k = 10) +
             s(DoY, k = 12, bs = 'cc') +
             s(YEAR, k = 30) +
             s(LONGITUDE, LATITUDE, k = 100, bs = 'ds', m = c(1, 0.5)) +
             ti(DoY, YEAR, bs = c('cc', 'tp'), k = c(12, 15)),
         data = galveston, method = 'fREML', knots = knots,
         nthreads = c(4, 1), discrete = TRUE)
```

---

# Galveston Bay &mdash; simpler model?

```r
AIC(m, m.sub)
```

```
##             df      AIC
## m     442.5221 58590.61
## m.sub 238.2766 59193.50
```
---

# Galveston Bay &mdash; simpler model?

.smaller[

```r
anova(m, m.sub, test = 'F')
```

```
## Analysis of Deviance Table
## 
## Model 1: MEASUREMENT ~ s(ToD, k = 10) + s(DoY, k = 12, bs = "cc") + s(YEAR, 
##     k = 30) + s(LONGITUDE, LATITUDE, k = 100, bs = "ds", m = c(1, 
##     0.5)) + ti(DoY, YEAR, bs = c("cc", "tp"), k = c(12, 15)) + 
##     ti(LONGITUDE, LATITUDE, ToD, d = c(2, 1), bs = c("ds", "tp"), 
##         m = list(c(1, 0.5), NA), k = c(20, 10)) + ti(LONGITUDE, 
##     LATITUDE, DoY, d = c(2, 1), bs = c("ds", "cc"), m = list(c(1, 
##         0.5), NA), k = c(25, 12)) + ti(LONGITUDE, LATITUDE, YEAR, 
##     d = c(2, 1), bs = c("ds", "tp"), m = list(c(1, 0.5), NA), 
##     k = c(25, 15))
## Model 2: MEASUREMENT ~ s(ToD, k = 10) + s(DoY, k = 12, bs = "cc") + s(YEAR, 
##     k = 30) + s(LONGITUDE, LATITUDE, k = 100, bs = "ds", m = c(1, 
##     0.5)) + ti(DoY, YEAR, bs = c("cc", "tp"), k = c(12, 15))
##   Resid. Df Resid. Dev      Df Deviance      F    Pr(>F)    
## 1     14744      39093                                      
## 2     15022      41769 -278.27  -2675.8 3.6508 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```
]

---

# Galveston Bay &mdash; full model summary

.small[

```r
summary(m)
```

```
## 
## Family: gaussian 
## Link function: identity 
## 
## Formula:
## MEASUREMENT ~ s(ToD, k = 10) + s(DoY, k = 12, bs = "cc") + s(YEAR, 
##     k = 30) + s(LONGITUDE, LATITUDE, k = 100, bs = "ds", m = c(1, 
##     0.5)) + ti(DoY, YEAR, bs = c("cc", "tp"), k = c(12, 15)) + 
##     ti(LONGITUDE, LATITUDE, ToD, d = c(2, 1), bs = c("ds", "tp"), 
##         m = list(c(1, 0.5), NA), k = c(20, 10)) + ti(LONGITUDE, 
##     LATITUDE, DoY, d = c(2, 1), bs = c("ds", "cc"), m = list(c(1, 
##         0.5), NA), k = c(25, 12)) + ti(LONGITUDE, LATITUDE, YEAR, 
##     d = c(2, 1), bs = c("ds", "tp"), m = list(c(1, 0.5), NA), 
##     k = c(25, 15))
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 21.83529    0.08755   249.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                                 edf Ref.df        F  p-value    
## s(ToD)                        3.357   4.05    5.626 0.000147 ***
## s(DoY)                        9.551  10.00 3266.413  < 2e-16 ***
## s(YEAR)                      27.987  28.74   54.565  < 2e-16 ***
## s(LONGITUDE,LATITUDE)        54.264  99.00    5.177  < 2e-16 ***
## ti(DoY,YEAR)                131.329 140.00   34.598  < 2e-16 ***
## ti(ToD,LONGITUDE,LATITUDE)   42.235 171.00    0.872  < 2e-16 ***
## ti(DoY,LONGITUDE,LATITUDE)   80.831 240.00    1.189  < 2e-16 ***
## ti(YEAR,LONGITUDE,LATITUDE)  83.543 329.00    1.080  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.94   Deviance explained = 94.2%
## fREML =  29810  Scale est. = 2.6339    n = 15276
```
]

---

# Galveston Bay &mdash; full model plot

```r
plot(m, pages = 1, scheme = 2, shade = TRUE)
```

![](index_files/figure-html/galveston-full-model-plot-1.svg)

---

# Galveston Bay &mdash; full model plot

```r
draw(m, scales = 'free')
```

![](index_files/figure-html/galveston-full-model-draw-1.svg)

---

# Galveston Bay &mdash; predict

```r
pdata <- with(galveston,
              expand.grid(ToD = 12,
                          DoY = 180,
                          YEAR = seq(min(YEAR), max(YEAR), by = 1),
                          LONGITUDE = seq(min(LONGITUDE), max(LONGITUDE), length = 100),
                          LATITUDE  = seq(min(LATITUDE), max(LATITUDE), length = 100)))
fit <- predict(m, pdata)
ind <- exclude.too.far(pdata$LONGITUDE, pdata$LATITUDE,
                       galveston$LONGITUDE, galveston$LATITUDE, dist = 0.1)
fit[ind] <- NA
pred <- cbind(pdata, Fitted = fit)
```

---

# Galveston Bay &mdash; plot

```r
plt <- ggplot(pred, aes(x = LONGITUDE, y = LATITUDE)) +
    geom_raster(aes(fill = Fitted)) + facet_wrap(~ YEAR, ncol = 12) +
    scale_fill_viridis(name = expression(degree*C), option = 'plasma', na.value = 'transparent') +
    coord_quickmap() +
    theme(legend.position = 'right')
plt
```

---

# Galveston Bay &mdash; plot

![](index_files/figure-html/galveston-full-predict-plot-1.svg)

---

# Galveston Bay &mdash;

.center[![](resources/galveston-animation.gif)]

---

# Galveston Bay &mdash; plot trends

```r
pdata <- with(galveston,
              expand.grid(ToD = 12,
                          DoY = c(1, 90, 180, 270),
                          YEAR = seq(min(YEAR), max(YEAR), length = 500),
                          LONGITUDE = -94.8751,
                          LATITUDE  = 29.50866))

fit <- data.frame(predict(m, newdata = pdata, se.fit = TRUE))
fit <- transform(fit, upper = fit + (2 * se.fit), lower = fit - (2 * se.fit))
pred <- cbind(pdata, fit)

plt2 <- ggplot(pred, aes(x = YEAR, y = fit, group = factor(DoY))) +
    geom_ribbon(aes(ymin = lower, ymax = upper), fill = 'grey', alpha = 0.5) +
    geom_line() + facet_wrap(~ DoY, scales = 'free_y') +
    labs(x = NULL, y = expression(Temperature ~ (degree * C)))
plt2
```

---

# Galveston Bay &mdash; plot trends

![](index_files/figure-html/galveston-trends-by-month-1.svg)

---

# Next steps

Read Simon Wood's book!

Lots more material on our ESA GAM Workshop site

[https://noamross.github.io/mgcv-esa-workshop/]()

Noam Ross' free GAM Course

<https://noamross.github.io/gams-in-r-course/>

A couple of papers:

.smaller[
1. Simpson, G.L., 2018. Modelling Palaeoecological Time Series Using Generalised Additive Models. Frontiers in Ecology and Evolution 6, 149. https://doi.org/10.3389/fevo.2018.00149
2. Pedersen, E.J., Miller, D.L., Simpson, G.L., Ross, N., 2019. Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ 7, e6876. https://doi.org/10.7717/peerj.6876
]

Also see my blog: [www.fromthebottomoftheheap.net](http://www.fromthebottomoftheheap.net)

---

# Acknowledgments

.row[

.col-6[

### Funding

.center[![:scale 70%](./resources/NSERC_C.svg)]
]

.col-6[

### Fellow GAM colleagues

David Miller

Eric Pedersen

Noam Ross

]

### Slides

* copy; Simpson (2020&ndash;2022) [![Creative Commons Licence](https://i.creativecommons.org/l/by/4.0/88x31.png)](http://creativecommons.org/licenses/by/4.0/)

---

# References

- [Marra & Wood (2011) *Computational Statistics and Data Analysis* **55** 2372&ndash;2387.](http://doi.org/10.1016/j.csda.2011.02.004)
- [Marra & Wood (2012) *Scandinavian Journal of Statistics, Theory and Applications* **39**(1), 53&ndash;74.](http://doi.org/10.1111/j.1467-9469.2011.00760.x.)
- [Nychka (1988) *Journal of the American Statistical Association* **83**(404) 1134&ndash;1143.](http://doi.org/10.1080/01621459.1988.10478711)
- Wood (2017) *Generalized Additive Models: An Introduction with R*. Chapman and Hall/CRC. (2nd Edition)
- [Wood (2013a) *Biometrika* **100**(1) 221&ndash;228.](http://doi.org/10.1093/biomet/ass048)
- [Wood (2013b) *Biometrika* **100**(4) 1005&ndash;1010.](http://doi.org/10.1093/biomet/ast038)
- [Wood et al (2016) *JASA* **111** 1548&ndash;1563](https://doi.org/10.1080/01621459.2016.1180986)