Expectations (fitted values) of the response drawn from the posterior distribution of fitted model using a Gaussian approximation to the posterior or a simple Metropolis Hastings sampler.
Usage
fitted_samples(model, ...)
# S3 method for class 'gam'
fitted_samples(
model,
n = 1,
data = newdata,
seed = NULL,
scale = c("response", "linear_predictor"),
method = c("gaussian", "mh", "inla", "user"),
n_cores = 1,
burnin = 1000,
thin = 1,
t_df = 40,
rw_scale = 0.25,
freq = FALSE,
unconditional = FALSE,
draws = NULL,
mvn_method = c("mvnfast", "mgcv"),
...,
newdata = NULL,
ncores = NULL
)
# S3 method for class 'scam'
fitted_samples(
model,
n = 1,
data = NULL,
seed = NULL,
scale = c("response", "linear_predictor"),
method = c("gaussian", "mh", "inla", "user"),
n_cores = 1,
burnin = 1000,
thin = 1,
t_df = 40,
rw_scale = 0.25,
freq = FALSE,
unconditional = FALSE,
draws = NULL,
mvn_method = c("mvnfast", "mgcv"),
...
)Arguments
- model
a fitted model of the supported types
- ...
arguments passed to other methods. For
fitted_samples(), these are passed on tomgcv::predict.gam(). Forposterior_samples()these are passed on tofitted_samples(). Forpredicted_samples()these are passed on to the relevantsimulate()method.- n
numeric; the number of posterior samples to return.
- data
data frame; new observations at which the posterior draws from the model should be evaluated. If not supplied, the data used to fit the model will be used for
data, if available inmodel.- seed
numeric; a random seed for the simulations.
- scale
character; what scale should the fitted values be returned on?
"linear predictor"is a synonym for"link"if you prefer that terminology.- method
character; which method should be used to draw samples from the posterior distribution.
"gaussian"uses a Gaussian (Laplace) approximation to the posterior."mh"uses a Metropolis Hastings sampler that alternates t proposals with proposals based on a shrunken version of the posterior covariance matrix."inla"uses a variant of Integrated Nested Laplace Approximation due to Wood (2019), (currently not implemented)."user"allows for user-supplied posterior draws (currently not implemented).- n_cores
number of cores for generating random variables from a multivariate normal distribution. Passed to
mvnfast::rmvn(). Parallelization will take place only if OpenMP is supported (but appears to work on Windows with currentR).- burnin
numeric; number of samples to discard as the burnin draws. Only used with
method = "mh".- thin
numeric; the number of samples to skip when taking
ndraws. Results inthin * ndraws from the posterior being taken. Only used withmethod = "mh".- t_df
numeric; degrees of freedom for t distribution proposals. Only used with
method = "mh".- rw_scale
numeric; Factor by which to scale posterior covariance matrix when generating random walk proposals. Negative or non finite to skip the random walk step. Only used with
method = "mh".- freq
logical;
TRUEto use the frequentist covariance matrix of the parameter estimators,FALSEto use the Bayesian posterior covariance matrix of the parameters.- unconditional
logical; if
TRUE(andfreq == FALSE) then the Bayesian smoothing parameter uncertainty corrected covariance matrix is used, if available.- draws
matrix; user supplied posterior draws to be used when
method = "user".- mvn_method
character; one of
"mvnfast"or"mgcv". The default is usesmvnfast::rmvn(), which can be considerably faster at generate large numbers of MVN random values thanmgcv::rmvn(), but which might not work for some marginal fits, such as those where the covariance matrix is close to singular.- newdata
Deprecated: use
datainstead.- ncores
Deprecated; use
n_coresinstead. The number of cores for generating random variables from a multivariate normal distribution. Passed tomvnfast::rmvn(). Parallelization will take place only if OpenMP is supported (but appears to work on Windows with currentR).
Value
A tibble (data frame) with 3 columns containing the posterior predicted values in long format. The columns are
row(integer) the row ofdatathat each posterior draw relates to,draw(integer) an index, in range1:n, indicating which draw each row relates to,response(numeric) the predicted response for the indicated row ofdata.
Note
Models with offset terms supplied via the offset argument to
mgcv::gam() etc. are ignored by mgcv::predict.gam(). As such, this
kind of offset term is also ignored by posterior_samples(). Offset terms
that are included in the model formula supplied to mgcv::gam() etc are
not ignored and the posterior samples produced will reflect those offset
term values. This has the side effect of requiring any new data values
provided to posterior_samples() via the data argument must include the
offset variable.
References
Wood, S.N., (2020). Simplified integrated nested Laplace approximation. Biometrika 107, 223–230. doi:10.1093/biomet/asz044
Examples
load_mgcv()
dat <- data_sim("eg1", n = 1000, dist = "normal", scale = 2, seed = 2)
m1 <- gam(y ~ s(x0) + s(x1) + s(x2) + s(x3), data = dat, method = "REML")
fs <- fitted_samples(m1, n = 5, seed = 42)
# \donttest{
fs
#> # A tibble: 5,000 x 4
#> .row .draw .parameter .fitted
#> <int> <int> <chr> <dbl>
#> 1 1 1 location 6.34
#> 2 2 1 location 5.08
#> 3 3 1 location 6.84
#> 4 4 1 location 7.71
#> 5 5 1 location 9.23
#> 6 6 1 location 8.03
#> 7 7 1 location 6.19
#> 8 8 1 location 7.28
#> 9 9 1 location 14.0
#> 10 10 1 location 12.7
#> # i 4,990 more rows
# }
# can generate own set of draws and use them
drws <- generate_draws(m1, n = 2, seed = 24)
fs2 <- fitted_samples(m1, method = "user", draws = drws)
# \donttest{
fs2
#> # A tibble: 2,000 x 4
#> .row .draw .parameter .fitted
#> <int> <int> <chr> <dbl>
#> 1 1 1 location 6.30
#> 2 2 1 location 5.12
#> 3 3 1 location 7.40
#> 4 4 1 location 7.42
#> 5 5 1 location 9.40
#> 6 6 1 location 8.04
#> 7 7 1 location 5.83
#> 8 8 1 location 7.30
#> 9 9 1 location 14.3
#> 10 10 1 location 13.1
#> # i 1,990 more rows
# }