Skip to contents

Given output from a Bayesian model and a candidate of subsets, compute the *acceptable family* of subsets that match or nearly match the predictive accuracy of the "best" subset. This function applies for binary data, such as logistic regression.

Usage

accept_family_binary(
  post_y_pred,
  post_lpd,
  XX,
  indicators,
  eps_level = 0.05,
  eta_level = 0,
  loss_type = "cross-ent",
  yy = NULL,
  post_y_hat = NULL,
  K = 10,
  sir_frac = 0.5,
  plot = TRUE
)

Arguments

post_y_pred

S x n matrix of posterior predictive draws at the given XX covariate values

post_lpd

S evaluations of the log-likelihood computed at each posterior draw of the parameters

XX

n x p matrix of covariates at which to evaluate

indicators

L x p matrix of inclusion indicators (booleans) where each row denotes a candidate subset

eps_level

probability required to match the predictive performance of the "best" model (up to eta_level)

eta_level

allowable margin ( and the "best" model

loss_type

loss function to be used: "cross-ent" (cross-entropy) or "misclass" (misclassication rate)

yy

n-dimensional vector of response variables

post_y_hat

S x n matrix of posterior fitted values at the given XX covariate values (optional)

K

number of cross-validation folds

sir_frac

fraction of the posterior samples to use for SIR

plot

logical; if TRUE, include a plot to summarize the predictive performance across candidate subsets

Value

a list containing the following elements:

  • all_accept: indices (i.e., rows of indicators) that correspond to the acceptable subsets

  • beta_hat_small linear coefficients for the smallest acceptable model

  • beta_hat_min linear coefficients for the "best" acceptable model

  • ell_small: index (i.e., row of indicators) of the smallest acceptable model

  • ell_min: index (i.e., row of indicators) of the "best" acceptable model

Details

see pp_loss_binary for additional details about the predictive and empirical comparisons.