Given output from a Bayesian model and a candidate of subsets, compute the *acceptable family* of subsets that match or nearly match the predictive accuracy of the "best" subset. This function applies for binary data, such as logistic regression.
Usage
accept_family_binary(
post_y_pred,
post_lpd,
XX,
indicators,
eps_level = 0.05,
eta_level = 0,
loss_type = "cross-ent",
yy = NULL,
post_y_hat = NULL,
K = 10,
sir_frac = 0.5,
plot = TRUE
)
Arguments
- post_y_pred
S x n
matrix of posterior predictive draws at the givenXX
covariate values- post_lpd
S
evaluations of the log-likelihood computed at each posterior draw of the parameters- XX
n x p
matrix of covariates at which to evaluate- indicators
L x p
matrix of inclusion indicators (booleans) where each row denotes a candidate subset- eps_level
probability required to match the predictive performance of the "best" model (up to
eta_level
)- eta_level
allowable margin ( and the "best" model
- loss_type
loss function to be used: "cross-ent" (cross-entropy) or "misclass" (misclassication rate)
- yy
n
-dimensional vector of response variables- post_y_hat
S x n
matrix of posterior fitted values at the givenXX
covariate values (optional)- K
number of cross-validation folds
- sir_frac
fraction of the posterior samples to use for SIR
- plot
logical; if TRUE, include a plot to summarize the predictive performance across candidate subsets
Value
a list containing the following elements:
all_accept
: indices (i.e., rows ofindicators
) that correspond to the acceptable subsetsbeta_hat_small
linear coefficients for the smallest acceptable modelbeta_hat_min
linear coefficients for the "best" acceptable modelell_small
: index (i.e., row ofindicators
) of the smallest acceptable modelell_min
: index (i.e., row ofindicators
) of the "best" acceptable model