Compute the predictive and empirical cross-validated squared error loss
Source:R/source_subsel.R
pp_loss.Rd
Use posterior predictive draws and a sampling-importance resampling (SIR) algorithm to approximate the cross-validated predictive squared error loss. The empirical squared error loss (i.e., the usual quantity in cross-validation) is also returned. The values are computed relative to the "best" subset according to minimum empirical squared error loss. Specifically, these quantities are computed for a collection of linear models that are fit to the Bayesian model output, where each linear model features a different subset of predictors.
Usage
pp_loss(
post_y_pred,
post_lpd,
XX,
yy,
indicators,
post_y_hat = NULL,
K = 10,
sir_frac = 0.5
)
Arguments
- post_y_pred
S x n
matrix of posterior predictive draws at the givenXX
covariate values- post_lpd
S
evaluations of the log-likelihood computed at each posterior draw of the parameters- XX
n x p
matrix of covariates at which to evaluate- yy
n
-dimensional vector of response variables- indicators
L x p
matrix of inclusion indicators (booleans) where each row denotes a candidate subset- post_y_hat
S x n
matrix of posterior fitted values at the givenXX
covariate values- K
number of cross-validation folds
- sir_frac
fraction of the posterior samples to use for SIR
Value
a list with two elements: pred_loss
and emp_loss
for the predictive and empirical loss, respectively, for each subset.
Details
The quantity post_y_hat
is the conditional expectation of the
response for each covariate value (columns) and using the parameters sampled
from the posterior (rows). For Bayesian linear regression, this term is
X %*% beta
. If unspecified, the algorithm will instead use post_y_pred
,
which is still correct but has lower Monte Carlo efficiency.