REGULARIZING LASSO: A CONSISTENT VARIABLE SELECTION METHOD

被引:9
|
作者
Li, Quefeng [1 ]
Shao, Jun [1 ,2 ]
机构
[1] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
[2] E China Normal Univ, Sch Finance & Stat, Shanghai 200241, Peoples R China
基金
美国国家科学基金会;
关键词
High-dimensional data; LASSO; regularization; selection consistency; sparsity; thresholding; COVARIANCE-MATRIX ESTIMATION; DIMENSIONAL FEATURE SPACE; REGRESSION;
D O I
10.5705/ss.2013.001
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
LASSO for variable selection in linear regression has been studied by many authors. To achieve asymptotic selection consistency, it is well known that the LASSO method requires a strong irrepresentable condition. Even adding a thresholding step after LASSO is still too conservative, especially when the number of explanatory variables p is much larger than the number of observations n. Another well-known method, the sure independence screening (SIS), applies thresholding to an estimator of marginal covariate effect vector and, therefore, is not selection consistent unless the zero components of the marginal covariate effect vector are asymptotically the same as the zero components of the regression effect vector. Since the weakness of LASSO is caused by the fact that it utilizes the covariate sample covariance matrix that is not well behaved when p is larger than n, we propose a regularized LASSO (RLASSO) method for replacing the covariate sample covariance matrix in LASSO by a regularized estimator of covariate covariance matrix and adding a thresholding step. Using a regularized estimator of covariate covariance matrix, we can consistently estimate the regression effects and, hence, our method also extends and improves the SIS method that estimates marginal covariate effects. We establish selection consistency of RLASSO under conditions that the regression effect vector is sparse and the covariate covariance matrix or its inverse is sparse. Some simulation results for comparing variable selection performances of RLASSO and various other methods are presented. A data example is also provided.
引用
收藏
页码:975 / 992
页数:18
相关论文
共 50 条