A robust variable screening procedure for ultra-high dimensional data

被引:6
|
作者
Ghosh, Abhik [1 ]
Thoresen, Magne [2 ]
机构
[1] Indian Stat Inst, Interdisciplinary Stat Res Unit, Kolkata, India
[2] Univ Oslo, Dept Biostat, Oslo Ctr Biostat & Epidemiol, Oslo, Norway
关键词
Variable selection; NP dimensionality; independence screening; minimum density power divergence estimator; influence function; gene selection; DENSITY POWER DIVERGENCE; SELECTION; MODELS; LASSO;
D O I
10.1177/09622802211017299
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Variable selection in ultra-high dimensional regression problems has become an important issue. In such situations, penalized regression models may face computational problems and some pre-screening of the variables may be necessary. A number of procedures for such pre-screening has been developed; among them the Sure Independence Screening (SIS) enjoys some popularity. However, SIS is vulnerable to outliers in the data, and in particular in small samples this may lead to faulty inference. In this paper, we develop a new robust screening procedure. We build on the density power divergence (DPD) estimation approach and introduce DPD-SIS and its extension iterative DPD-SIS. We illustrate the behavior of the methods through extensive simulation studies and show that they are superior to both the original SIS and other robust methods when there are outliers in the data. Finally, we illustrate its use in a study on regulation of lipid metabolism.
引用
收藏
页码:1816 / 1832
页数:17
相关论文
共 50 条
  • [41] Forward variable selection for ultra-high dimensional quantile regression models
    Honda, Toshio
    Lin, Chien-Tong
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2023, 75 (03) : 393 - 424
  • [42] A screening method for ultra-high dimensional features with overlapped partition structures
    He, Jie
    Song, Jiali
    Zhou, Xiao-Hua
    Hou, Yan
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2023, 32 (01) : 22 - 40
  • [43] The fused Kolmogorov-Smirnov screening for ultra-high dimensional semi-competing risks data
    Liu, Yi
    Chen, Xiaolin
    Wang, Hong
    APPLIED MATHEMATICAL MODELLING, 2021, 98 : 109 - 120
  • [44] GLOBALLY ADAPTIVE QUANTILE REGRESSION WITH ULTRA-HIGH DIMENSIONAL DATA
    Zheng, Qi
    Peng, Limin
    He, Xuming
    ANNALS OF STATISTICS, 2015, 43 (05): : 2225 - 2258
  • [45] Forward Variable Selection for Sparse Ultra-High Dimensional Varying Coefficient Models
    Cheng, Ming-Yen
    Honda, Toshio
    Zhang, Jin-Ting
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (515) : 1209 - 1221
  • [46] A Combined Feature Screening Approach of Random Forest and Filter-based Methods for Ultra-high Dimensional Data
    Zhou, Lifeng
    Wang, Hong
    CURRENT BIOINFORMATICS, 2022, 17 (04) : 344 - 357
  • [47] Joint model-free feature screening for ultra-high dimensional semi-competing risks data
    Lu, Shuiyun
    Chen, Xiaolin
    Xu, Sheng
    Liu, Chunling
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 147
  • [48] Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features
    An, Baiguo
    Feng, Guozhong
    Guo, Jianhua
    JOURNAL OF CLASSIFICATION, 2022, 39 (01) : 122 - 146
  • [49] Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features
    Baiguo An
    Guozhong Feng
    Jianhua Guo
    Journal of Classification, 2022, 39 : 122 - 146
  • [50] A variable oscillator for ultra-high frequency measurements
    King, R
    REVIEW OF SCIENTIFIC INSTRUMENTS, 1939, 10 (11): : 325 - 331