Monte Carlo sampling for Bayesian spline regression with an unknown (nonparametric) transformation. Cubic B-splines are used with a prior that penalizes roughness.
Usage
sbsm(
y,
x = NULL,
x_test = NULL,
psi = NULL,
nbasis = NULL,
fixedX = (length(y) >= 500),
approx_g = FALSE,
nsave = 1000,
ngrid = 100,
verbose = TRUE
)Arguments
- y
n x 1response vector- x
n x 1vector of observation points; if NULL, assume equally-spaced on [0,1]- x_test
n_test x 1vector of testing points; default isx- psi
prior variance (inverse smoothing parameter); if NULL, sample this parameter
- nbasis
number of spline basis functions; if NULL, use the default from
spikeSlabGAM::sm- fixedX
logical; if TRUE, treat the design as fixed (non-random) when sampling the transformation; otherwise treat covariates as random with an unknown distribution
- approx_g
logical; if TRUE, apply large-sample approximation for the transformation
- nsave
number of Monte Carlo simulations
- ngrid
number of grid points for inverse approximations
- verbose
logical; if TRUE, print time remaining
Value
a list with the following elements:
coefficientsthe posterior mean of the regression coefficientsfitted.valuesthe posterior predictive mean at the test pointsx_testpost_theta:nsave x psamples from the posterior distribution of the regression coefficientspost_ypred:nsave x n_testsamples from the posterior predictive distribution atx_testpost_g:nsaveposterior samples of the transformation evaluated at the uniqueyvaluesmodel: the model fit (here,sbsm)
as well as the arguments passed in.
Details
This function provides fully Bayesian inference for a
transformed spline regression model using Monte Carlo (not MCMC) sampling.
The transformation is modeled as unknown (unless approx_g = TRUE,
which then uses a point approximation) and learned jointly with the regression function.
This model applies for real-valued data, positive data, and compactly-supported data
(the support is automatically deduced from the observed y values).
By default, fixedX is set to FALSE for smaller datasets (n < 500)
and TRUE for larger datasets (n >= 500).
Examples
# \donttest{
# Simulate some data:
n = 200 # sample size
x = sort(runif(n)) # observation points
# Transform a noisy, periodic function:
y = g_inv_bc(
sin(2*pi*x) + sin(4*pi*x) + rnorm(n),
lambda = .5) # Signed square-root transformation
# Fit the semiparametric Bayesian spline model:
fit = sbsm(y = y, x = x)
#> [1] "3 sec remaining"
#> [1] "2 sec remaining"
#> [1] "Total time: 3 seconds"
names(fit) # what is returned
#> [1] "coefficients" "fitted.values" "post_theta" "post_ypred"
#> [5] "post_g" "post_psi" "model" "y"
#> [9] "X" "sample_psi" "fixedX" "approx_g"
# Note: this is Monte Carlo sampling...no need for MCMC diagnostics!
# Plot the model predictions (point and interval estimates):
pi_y = t(apply(fit$post_ypred, 2, quantile, c(0.05, .95))) # 90% PI
plot(x, y, type='n', ylim = range(pi_y,y),
xlab = 'x', ylab = 'y', main = paste('Fitted values and prediction intervals'))
polygon(c(x, rev(x)),c(pi_y[,2], rev(pi_y[,1])),col='gray', border=NA)
lines(x, y, type='p') # observed points
lines(x, fitted(fit), lwd = 3) # fitted curve
#> Error in xy.coords(x, y): 'x' and 'y' lengths differ
# }