Skip to contents

Use posterior predictive draws and a sampling-importance resampling (SIR) algorithm to approximate the cross-validated predictive squared error loss. The empirical squared error loss (i.e., the usual quantity in cross-validation) is also returned. The values are computed relative to the "best" subset according to minimum empirical squared error loss. Specifically, these quantities are computed for a collection of linear models that are fit to the Bayesian model output, where each linear model features a different subset of predictors.

Usage

pp_loss(
  post_y_pred,
  post_lpd,
  XX,
  yy,
  indicators,
  post_y_hat = NULL,
  K = 10,
  sir_frac = 0.5
)

Arguments

post_y_pred

S x n matrix of posterior predictive draws at the given XX covariate values

post_lpd

S evaluations of the log-likelihood computed at each posterior draw of the parameters

XX

n x p matrix of covariates at which to evaluate

yy

n-dimensional vector of response variables

indicators

L x p matrix of inclusion indicators (booleans) where each row denotes a candidate subset

post_y_hat

S x n matrix of posterior fitted values at the given XX covariate values

K

number of cross-validation folds

sir_frac

fraction of the posterior samples to use for SIR

Value

a list with two elements: pred_loss and emp_loss

for the predictive and empirical loss, respectively, for each subset.

Details

The quantity post_y_hat is the conditional expectation of the response for each covariate value (columns) and using the parameters sampled from the posterior (rows). For Bayesian linear regression, this term is X %*% beta. If unspecified, the algorithm will instead use post_y_pred, which is still correct but has lower Monte Carlo efficiency.