The sparsity and bias of the lasso selection in high-dimensional linear regression

被引:501
|
作者
Zhang, Cun-Hui [1 ]
Huang, Jian [2 ]
机构
[1] Rutgers State Univ, Dept Stat, Hill Ctr, Piscataway, NJ 08854 USA
[2] Univ Iowa, Dept Stat & Actuarial Sci, Iowa City, IA 52242 USA
来源
ANNALS OF STATISTICS | 2008年 / 36卷 / 04期
关键词
penalized regression; high-dimensional data; variable selection; bias; rate consistency; spectral analysis; random matrices;
D O I
10.1214/07-AOS520
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436-1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541-2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the l(alpha)-loss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs.
引用
下载
收藏
页码:1567 / 1594
页数:28
相关论文
共 50 条
  • [41] Adaptive group Lasso for high-dimensional generalized linear models
    Mingqiu Wang
    Guo-Liang Tian
    Statistical Papers, 2019, 60 : 1469 - 1486
  • [42] Variational Inference in high-dimensional linear regression
    Mukherjee, Sumit
    Sen, Subhabrata
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [43] ACCURACY ASSESSMENT FOR HIGH-DIMENSIONAL LINEAR REGRESSION
    Cai, T. Tony
    Guo, Zijian
    ANNALS OF STATISTICS, 2018, 46 (04): : 1807 - 1836
  • [44] Prediction in abundant high-dimensional linear regression
    Cook, R. Dennis
    Forzani, Liliana
    Rothman, Adam J.
    ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 3059 - 3088
  • [45] Elementary Estimators for High-Dimensional Linear Regression
    Yang, Eunho
    Lozano, Aurelie C.
    Ravikumar, Pradeep
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 388 - 396
  • [46] Variational Inference in high-dimensional linear regression
    Mukherjee, Sumit
    Sen, Subhabrata
    Journal of Machine Learning Research, 2022, 23
  • [47] A Note on High-Dimensional Linear Regression With Interactions
    Hao, Ning
    Zhang, Hao Helen
    AMERICAN STATISTICIAN, 2017, 71 (04): : 291 - 297
  • [48] Bayesian adaptive lasso with variational Bayes for variable selection in high-dimensional generalized linear mixed models
    Dao Thanh Tung
    Minh-Ngoc Tran
    Tran Manh Cuong
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2019, 48 (02) : 530 - 543
  • [49] Estimation of Linear Functionals in High-Dimensional Linear Models: From Sparsity to Nonsparsity
    Zhao, Junlong
    Zhou, Yang
    Liu, Yufeng
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 1579 - 1591
  • [50] GFLASSO-LR: Logistic Regression with Generalized Fused LASSO for Gene Selection in High-Dimensional Cancer Classification
    Bir-Jmel, Ahmed
    Douiri, Sidi Mohamed
    Bernoussi, Souad El
    Maafiri, Ayyad
    Himeur, Yassine
    Atalla, Shadi
    Mansoor, Wathiq
    Al-Ahmad, Hussain
    COMPUTERS, 2024, 13 (04)