Sequential Lasso Cum EBIC for Feature Selection With Ultra-High Dimensional Feature Space

被引:48
|
作者
Luo, Shan [1 ]
Chen, Zehua [2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Math, Shanghai 200030, Peoples R China
[2] Natl Univ Singapore, Dept Stat & Appl Probabil, Singapore 117548, Singapore
关键词
Extended BIC; Oracle property; Selection consistency; Sparse high-dimensional linear models; NONCONCAVE PENALIZED LIKELIHOOD; ORTHOGONAL MATCHING PURSUIT; VARIABLE SELECTION; MODEL SELECTION; SIGNAL RECOVERY; ORACLE PROPERTIES; ADAPTIVE LASSO; LINEAR-MODELS; REGRESSION; SHRINKAGE;
D O I
10.1080/01621459.2013.877275
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this article, we propose a method called sequential Lasso (SLasso) for feature selection in sparse high-dimensional linear models. The SLasso selects features by sequentially solving partially penalized least squares problems where the features selected in earlier steps are not penalized. The SLasso uses extended BIC (EBIC) as the stopping rule. The procedure stops when EBIC reaches a minimum. The asymptotic properties of SLasso are considered when the dimension of the feature space is ultra high and the number of relevant feature diverges. We show that, with probability converging to 1, the SLasso first selects all the relevant features before any irrelevant features can be selected, and that the EBIC decreases until it attains the minimum at the model consisting of exactly all the relevant features and then begins to increase. These results establish the selection consistency of SLasso. The SLasso estimators of the final model are ordinary least squares estimators. The selection consistency implies the oracle property of SLasso. The asymptotic distribution of the SLasso estimators with diverging number of relevant features is provided. The SLasso is compared with other methods by simulation studies, which demonstrates that SLasso is a desirable approach having an edge over the other methods. The SLasso together with the other methods are applied to a microarray data for mapping disease genes. Supplementary materials for this article are available online.
引用
收藏
页码:1229 / 1240
页数:12
相关论文
共 50 条
  • [1] Tournament screening cum EBIC for feature selection with high-dimensional feature spaces
    ZeHua Chen
    JiaHua Chen
    [J]. Science in China Series A: Mathematics, 2009, 52 : 1327 - 1341
  • [2] Tournament screening cum EBIC for feature selection with high-dimensional feature spaces
    CHEN ZeHua1 & CHEN JiaHua2 1 Department of Statistics & Applied Probability
    [J]. Science China Mathematics, 2009, (06) : 1327 - 1341
  • [3] Tournament screening cum EBIC for feature selection with high-dimensional feature spaces
    Chen Zehua
    Chen JiaHua
    [J]. SCIENCE IN CHINA SERIES A-MATHEMATICS, 2009, 52 (06): : 1327 - 1341
  • [4] The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data
    Yawei He
    Zehua Chen
    [J]. Annals of the Institute of Statistical Mathematics, 2016, 68 : 155 - 180
  • [5] The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data
    He, Yawei
    Chen, Zehua
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2016, 68 (01) : 155 - 180
  • [6] High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso
    Yamada, Makoto
    Jitkrittum, Wittawat
    Sigal, Leonid
    Xing, Eric P.
    Sugiyama, Masashi
    [J]. NEURAL COMPUTATION, 2014, 26 (01) : 185 - 207
  • [7] Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data
    ZHANG Junying
    WANG Hang
    ZHANG Riquan
    ZHANG Jiajia
    [J]. Journal of Systems Science & Complexity, 2020, 33 (02) : 510 - 526
  • [8] Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data
    Zhang, Junying
    Wang, Hang
    Zhang, Riquan
    Zhang, Jiajia
    [J]. JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2020, 33 (02) : 510 - 526
  • [9] Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data
    Junying Zhang
    Hang Wang
    Riquan Zhang
    Jiajia Zhang
    [J]. Journal of Systems Science and Complexity, 2020, 33 : 510 - 526
  • [10] BEAR: Sketching BFGS Algorithm for Ultra-High Dimensional Feature Selection in Sublinear Memory
    Aghazadeh, Amirali
    Gupta, Vipul
    DeWeese, Alex
    Koyluoglu, O. Ozan
    Ramchandran, Kannan
    [J]. MATHEMATICAL AND SCIENTIFIC MACHINE LEARNING, VOL 145, 2021, 145 : 75 - 92