Ultra-high dimensional variable screening via Gram-Schmidt orthogonalization

被引:2
|
作者
Wang, Huiwen [1 ,2 ]
Liu, Ruiping [1 ]
Wang, Shanshan [1 ,3 ]
Wang, Zhichao [1 ]
Saporta, Gilbert [4 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing, Peoples R China
[2] Beijing Adv Innovat Ctr Big Data & Brain Comp, Beijing, Peoples R China
[3] Beijing Key Lab Emergence Support Simulat Technol, Beijing, Peoples R China
[4] Cedr Conservatoire Natl Arts & Metiers, Paris, France
基金
美国国家科学基金会;
关键词
Variable selection; High correlation; High dimensionality; Screening procedure; FEATURE-SELECTION; GENE-EXPRESSION; REGRESSION; SHRINKAGE; ALGORITHM;
D O I
10.1007/s00180-020-00963-7
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Independence screening procedure plays a vital role in variable selection when the number of variables is massive. However, high dimensionality of the data may bring in many challenges, such as multicollinearity or high correlation (possibly spurious) between the covariates, which results in marginal correlation being unreliable as a measure of association between the covariates and the response. We propose a novel and simple screening procedure called Gram-Schmidt screening (GSS) by integrating the classical Gram-Schmidt orthogonalization and the sure independence screening technique, which takes into account high correlations between the covariates in a data-driven way. GSS could successfully discriminate between the relevant and the irrelevant variables to achieve a high true positive rate without including many irrelevant and redundant variables, which offers a new perspective for screening method when the covariates are highly correlated. The practical performance of GSS was shown by comparative simulation studies and analysis of two real datasets.
引用
收藏
页码:1153 / 1170
页数:18
相关论文
共 50 条