Generate data from a (sparse) Gaussian linear model. The covariates are correlated Gaussian variables. The user may control the signal-to-noise and the number of nonzero coefficients.
Usage
simulate_lm(n, p, p_sig = min(5, p/2), SNR = 1)Arguments
- n
number of observations
- p
number of covariates
- p_sig
number of true nonzero coefficients (signals)
- SNR
signal-to-noise ratio
Value
a list with the following elements:
y: the response variableX: the matrix of covariatesbeta_true: the true regression coefficients (including an intercept)Ey_true: the true expectation ofy(X%*%beta_true)sigma_true: the true error standard deviation
Details
The true regression coefficients include an intercept (-1) and
otherwise the p_sig nonzero coefficients are half equal to 1 and
half equal to -1.
Examples
# Simulate data:
dat = simulate_lm(n = 100, p = 10)
names(dat) # what is returned
#> [1] "y" "X" "beta_true" "Ey_true" "sigma_true"