Prepare a data slice through model covariates
Usage
data_slice(object, ...)
# Default S3 method
data_slice(object, ...)
# S3 method for class 'data.frame'
data_slice(object, ...)
# S3 method for class 'gam'
data_slice(object, ..., data = NULL, envir = NULL)
# S3 method for class 'gamm'
data_slice(object, ...)
# S3 method for class 'list'
data_slice(object, ...)
# S3 method for class 'scam'
data_slice(object, ...)
Arguments
- object
an R model object.
- ...
<
dynamic-dots
> User supplied variables defining the data slice. Arguments passed via...
need to be named.- data
an alternative data frame of values containing all the variables needed to fit the model. If
NULL
, the default, the data used to fit the model will be recovered usingmodel.frame
. User-supplied expressions passed in...
will be evaluated indata
.- envir
the environment within which to recreate the data used to fit
object
.
Details
A data slice is the data set that results where one (or more covariates) is varied systematically over some or all of its (their) range or at a specified subset of values of interest, while any remaining covariates in the model are held at fixed, representative values. This is known as a reference grid in package emmeans and a data grid in the marginaleffects package.
For GAMs, any covariates not specified via ...
will take representative
values determined from the data used to fit the model as follows:
for numeric covariates, the value in the fitting data that is closest to the median value is used,
for factor covariates, the modal (most frequently observed) level is used, or the first level (sorted as per the vector returned by
base::levels()
if several levels are observed the same number of times.
These values are already computed when calling gam()
or bam()
for example
and can be found in the var.summary
component of the fitted model. Function
typical_values()
will extract these values for you if you are interested.
Convenience functions evenly()
, ref_level()
, and level()
are provided
to help users specify data slices. ref_level()
, and level()
also ensure
that factor covariates have the correct levels, as needed by
mgcv::predict.gam()
for example.
For an extended discussion of data_slice()
and further examples, see
vignette("data-slices", package = "gratia")
.
See also
The convenience functions evenly()
, ref_level()
, and level()
.
typical_values()
for extracting the representative values used for
covariates in the model but not named in the slice.
Examples
load_mgcv()
# simulate some Gaussian data
df <- data_sim("eg1", n = 50, seed = 2)
# fit a GAM with 1 smooth and 1 linear term
m <- gam(y ~ s(x2, k = 7) + x1, data = df, method = "REML")
# Want to predict over f(x2) while holding `x1` at some value.
# Default will use the observation closest to the median for unspecified
# variables.
ds <- data_slice(m, x2 = evenly(x2, n = 50))
ds
#> # A tibble: 50 x 2
#> x2 x1
#> <dbl> <dbl>
#> 1 0.0228 0.403
#> 2 0.0424 0.403
#> 3 0.0619 0.403
#> 4 0.0815 0.403
#> 5 0.101 0.403
#> 6 0.121 0.403
#> 7 0.140 0.403
#> 8 0.160 0.403
#> 9 0.179 0.403
#> 10 0.199 0.403
#> # i 40 more rows
# for full control, specify the values you want
ds <- data_slice(m, x2 = evenly(x2, n = 50), x1 = 0.3)
# or provide an expression (function call) which will be evaluated in the
# data frame passed to `data` or `model.frame(object)`
ds <- data_slice(m, x2 = evenly(x2, n = 50), x1 = mean(x1))