Compute the acceptable family of linear subsets for the random intercept model

Given output from a Bayesian random intercept model and a candidate of subsets, compute the *acceptable family* of subsets that match or nearly match the predictive accuracy of the "best" subset. The acceptable family may be computed for any set of covariate values XX; if XX = X are the in-sample points, then cross-validation is used to assess out-of-sample predictive performance.

Usage

accept_family_randint(
  post_y_pred,
  post_lpd,
  post_sigma_e,
  post_sigma_u,
  XX,
  YY,
  indicators,
  post_y_pred_sum = NULL,
  eps_level = 0.05,
  eta_level = 0,
  K = 10,
  sir_frac = 0.5,
  plot = TRUE
)

Arguments

post_y_pred: S x m x n matrix of posterior predictive draws at the given XX covariate values for m replicates per subject
post_lpd: S evaluations of the log-likelihood computed at each posterior draw of the parameters
post_sigma_e: (nsave) draws from the posterior distribution of the observation error SD
post_sigma_u: (nsave) draws from the posterior distribution of the random intercept SD
XX: n x p matrix of covariates at which to evaluate
YY: m x n matrix of response variables (optional)
indicators: L x p matrix of inclusion indicators (booleans) where each row denotes a candidate subset
post_y_pred_sum: (nsave x n) matrix of the posterior predictive draws summed over the replicates within each subject (optional)
eps_level: probability required to match the predictive performance of the "best" model (up to eta_level)
eta_level: allowable margin ( and the "best" model
K: number of cross-validation folds (optional)
sir_frac: fraction of the posterior samples to use for SIR (optional)
plot: logical; if TRUE, include a plot to summarize the predictive performance across candidate subsets

Value

a list containing the following elements:

all_accept: indices (i.e., rows of indicators) that correspond to the acceptable subsets
beta_hat_small linear coefficients for the smallest acceptable model
beta_hat_min linear coefficients for the "best" acceptable model
ell_small: index (i.e., row of indicators) of the smallest acceptable model
ell_min: index (i.e., row of indicators) of the "best" acceptable model

Details

When XX = X is the observed covariate values, then post_lpd and yy must be provided. These are used to compute the cross-validated predictive and empirical squared errors; the predictive version relies on a sampling importance-resampling procedure.

When XX corresponds to a new set of covariate values, then set post_lpd = NULL and yy = NULL (these are the default values).

Additional details on the predictive and empirical comparisons are in pp_loss_randint.