Generate data from a (sparse) Gaussian linear model. The covariates are correlated Gaussian variables. The user may control the signal-to-noise and the number of nonzero coefficients.
Usage
simulate_lm(n, p, p_sig = min(5, p/2), SNR = 1)
Arguments
- n
number of observations
- p
number of covariates
- p_sig
number of true nonzero coefficients (signals)
- SNR
signal-to-noise ratio
Value
a list with the following elements:
y
: the response variableX
: the matrix of covariatesbeta_true
: the true regression coefficients (including an intercept)Ey_true
: the true expectation ofy
(X%*%beta_true
)sigma_true
: the true error standard deviation
Details
The true regression coefficients include an intercept (-1) and
otherwise the p_sig
nonzero coefficients are half equal to 1 and
half equal to -1.
Examples
# Simulate data:
dat = simulate_lm(n = 100, p = 10)
names(dat) # what is returned
#> [1] "y" "X" "beta_true" "Ey_true" "sigma_true"