Skip to contents

Generate data from a (sparse) Gaussian linear model. The covariates are correlated Gaussian variables. The user may control the signal-to-noise and the number of nonzero coefficients.

Usage

simulate_lm(n, p, p_sig = min(5, p/2), SNR = 1)

Arguments

n

number of observations

p

number of covariates

p_sig

number of true nonzero coefficients (signals)

SNR

signal-to-noise ratio

Value

a list with the following elements:

  • y: the response variable

  • X: the matrix of covariates

  • beta_true: the true regression coefficients (including an intercept)

  • Ey_true: the true expectation of y (X%*%beta_true)

  • sigma_true: the true error standard deviation

Details

The true regression coefficients include an intercept (-1) and otherwise the p_sig nonzero coefficients are half equal to 1 and half equal to -1.

Examples

# Simulate data:
dat = simulate_lm(n = 100, p = 10)
names(dat) # what is returned
#> [1] "y"          "X"          "beta_true"  "Ey_true"    "sigma_true"