Compute the predictive and empirical cross-validated squared error loss
Source:R/source_subsel.R
pp_loss.RdUse posterior predictive draws and a sampling-importance resampling (SIR) algorithm to approximate the cross-validated predictive squared error loss. The empirical squared error loss (i.e., the usual quantity in cross-validation) is also returned. The values are computed relative to the "best" subset according to minimum empirical squared error loss. Specifically, these quantities are computed for a collection of linear models that are fit to the Bayesian model output, where each linear model features a different subset of predictors.
Usage
pp_loss(
post_y_pred,
post_lpd,
XX,
yy,
indicators,
post_y_hat = NULL,
K = 10,
sir_frac = 0.5
)Arguments
- post_y_pred
S x nmatrix of posterior predictive draws at the givenXXcovariate values- post_lpd
Sevaluations of the log-likelihood computed at each posterior draw of the parameters- XX
n x pmatrix of covariates at which to evaluate- yy
n-dimensional vector of response variables- indicators
L x pmatrix of inclusion indicators (booleans) where each row denotes a candidate subset- post_y_hat
S x nmatrix of posterior fitted values at the givenXXcovariate values- K
number of cross-validation folds
- sir_frac
fraction of the posterior samples to use for SIR
Value
a list with two elements: pred_loss and emp_loss
for the predictive and empirical loss, respectively, for each subset.
Details
The quantity post_y_hat is the conditional expectation of the
response for each covariate value (columns) and using the parameters sampled
from the posterior (rows). For Bayesian linear regression, this term is
X %*% beta. If unspecified, the algorithm will instead use post_y_pred,
which is still correct but has lower Monte Carlo efficiency.