Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization

被引:1
|
作者
Huiwen Wang
Ruiping Liu
Shanshan Wang
Zhichao Wang
Gilbert Saporta
机构
[1] Beihang University,School of Economics and Management
[2] Beijing Advanced Innovation Center for Big Data and Brain Computing,Cedric
[3] Beijing Key Laboratory of Emergence Support Simulation Technologies for City Operations,undefined
[4] Conservatoire National des Arts et Métiers,undefined
来源
Computational Statistics | 2020年 / 35卷
关键词
Variable selection; High correlation; High dimensionality; Screening procedure;
D O I
暂无
中图分类号
学科分类号
摘要
Independence screening procedure plays a vital role in variable selection when the number of variables is massive. However, high dimensionality of the data may bring in many challenges, such as multicollinearity or high correlation (possibly spurious) between the covariates, which results in marginal correlation being unreliable as a measure of association between the covariates and the response. We propose a novel and simple screening procedure called Gram–Schmidt screening (GSS) by integrating the classical Gram–Schmidt orthogonalization and the sure independence screening technique, which takes into account high correlations between the covariates in a data-driven way. GSS could successfully discriminate between the relevant and the irrelevant variables to achieve a high true positive rate without including many irrelevant and redundant variables, which offers a new perspective for screening method when the covariates are highly correlated. The practical performance of GSS was shown by comparative simulation studies and analysis of two real datasets.
引用
收藏
页码:1153 / 1170
页数:17
相关论文
共 50 条
  • [31] Robust adaptive variable selection in ultra-high dimensional linear regression models
    Ghosh, Abhik
    Jaenada, Maria
    Pardo, Leandro
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2024, 94 (03) : 571 - 603
  • [32] Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features
    An, Baiguo
    Feng, Guozhong
    Guo, Jianhua
    [J]. JOURNAL OF CLASSIFICATION, 2022, 39 (01) : 122 - 146
  • [33] NONPARAMETRIC INDEPENDENCE SCREENING AND STRUCTURE IDENTIFICATION FOR ULTRA-HIGH DIMENSIONAL LONGITUDINAL DATA
    Cheng, Ming-Yen
    Honda, Toshio
    Li, Jialiang
    Peng, Heng
    [J]. ANNALS OF STATISTICS, 2014, 42 (05): : 1819 - 1849
  • [34] Interaction Identification and Clique Screening for Classification with Ultra-high Dimensional Discrete Features
    Baiguo An
    Guozhong Feng
    Jianhua Guo
    [J]. Journal of Classification, 2022, 39 : 122 - 146
  • [35] A variable oscillator for ultra-high frequency measurements
    King, R
    [J]. REVIEW OF SCIENTIFIC INSTRUMENTS, 1939, 10 (11): : 325 - 331
  • [36] Variable selection for ultra-high dimensional quantile regression with missing data and measurement error
    Bai, Yongxin
    Tian, Maozai
    Tang, Man-Lai
    Lee, Wing-Yan
    [J]. STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (01) : 129 - 150
  • [37] Low-Complex and Low-Power n-dimensional Gram–Schmidt Orthogonalization Architecture Design Methodology
    Swati Bhardwaj
    Shashank Raghuraman
    Jayesh B. Yerrapragada
    Agathya Jagirdar
    Koushik Maharatna
    Amit Acharyya
    [J]. Circuits, Systems, and Signal Processing, 2022, 41 : 1633 - 1659
  • [38] Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data
    Junying Zhang
    Hang Wang
    Riquan Zhang
    Jiajia Zhang
    [J]. Journal of Systems Science and Complexity, 2020, 33 : 510 - 526
  • [39] Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data
    Zhang, Junying
    Wang, Hang
    Zhang, Riquan
    Zhang, Jiajia
    [J]. JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2020, 33 (02) : 510 - 526
  • [40] Sequential Feature Screening for Generalized Linear Models with Sparse Ultra-High Dimensional Data
    ZHANG Junying
    WANG Hang
    ZHANG Riquan
    ZHANG Jiajia
    [J]. Journal of Systems Science & Complexity, 2020, 33 (02) : 510 - 526