Compute the predictive and empirical cross-validated squared error loss

Use posterior predictive draws and a sampling-importance resampling (SIR) algorithm to approximate the cross-validated predictive squared error loss. The empirical squared error loss (i.e., the usual quantity in cross-validation) is also returned. The values are computed relative to the "best" subset according to minimum empirical squared error loss. Specifically, these quantities are computed for a collection of linear models that are fit to the Bayesian model output, where each linear model features a different subset of predictors.

Usage

pp_loss(
  post_y_pred,
  post_lpd,
  XX,
  yy,
  indicators,
  post_y_hat = NULL,
  K = 10,
  sir_frac = 0.5
)

Arguments

post_y_pred: S x n matrix of posterior predictive draws at the given XX covariate values
post_lpd: S evaluations of the log-likelihood computed at each posterior draw of the parameters
XX: n x p matrix of covariates at which to evaluate
yy: n-dimensional vector of response variables
indicators: L x p matrix of inclusion indicators (booleans) where each row denotes a candidate subset
post_y_hat: S x n matrix of posterior fitted values at the given XX covariate values
K: number of cross-validation folds
sir_frac: fraction of the posterior samples to use for SIR

Value

a list with two elements: pred_loss and emp_loss

for the predictive and empirical loss, respectively, for each subset.

Details

The quantity post_y_hat is the conditional expectation of the response for each covariate value (columns) and using the parameters sampled from the posterior (rows). For Bayesian linear regression, this term is X %*% beta. If unspecified, the algorithm will instead use post_y_pred, which is still correct but has lower Monte Carlo efficiency.