Compute the predictive and empirical cross-validated loss for binary data.
Source:R/source_subsel.R
pp_loss_binary.Rd
Use posterior predictive draws and a sampling-importance resampling (SIR) algorithm to approximate the cross-validated predictive loss. The empirical loss (i.e., the usual quantity in cross-validation) is also returned. The values are computed relative to the "best" subset according to minimum empirical loss. Specifically, these quantities are computed for a collection of linear models that are fit to the Bayesian model output, where each linear model features a different subset of predictors. The loss function may be chosen as cross-entropy or misclassification rate
Usage
pp_loss_binary(
post_y_pred,
post_lpd,
XX,
yy,
indicators,
loss_type = "cross-ent",
post_y_hat = NULL,
K = 10,
sir_frac = 0.5
)
Arguments
- post_y_pred
S x n
matrix of posterior predictive draws at the givenXX
covariate values- post_lpd
S
evaluations of the log-likelihood computed at each posterior draw of the parameters- XX
n x p
matrix of covariates at which to evaluate- yy
n
-dimensional vector of response variables- indicators
L x p
matrix of inclusion indicators (booleans) where each row denotes a candidate subset- loss_type
loss function to be used: "cross-ent" (cross-entropy) or "misclass" (misclassication rate)
- post_y_hat
S x n
matrix of posterior fitted values at the givenXX
covariate values- K
number of cross-validation folds
- sir_frac
fraction of the posterior samples to use for SIR
Value
a list with two elements: pred_loss
and emp_loss
for the predictive and empirical loss, respectively, for each subset.
Details
The quantity post_y_hat
is the conditional expectation of the
response for each covariate value (columns) and using the parameters sampled
from the posterior (rows). For binary data, this is also the estimated
probability of "success".
If unspecified, the algorithm will instead use post_y_pred
,
which is still correct but has lower Monte Carlo efficiency.