MODEL-FREE FEATURE SCREENING FOR ULTRAHIGH DIMENSIONAL DATATHROUGH A MODIFIED BLUM-KIEFER-ROSENBLATT CORRELATION

被引:17
|
作者
Zhou, Yeqing [1 ]
Zhu, Liping [2 ]
机构
[1] Shanghai Univ Finance & Econ, Sch Stat & Management, Shanghai 200433, Peoples R China
[2] Renmin Univ China, Ctr Appl Stat, Inst Stat & Big Data, Beijing 100872, Peoples R China
基金
中国国家自然科学基金;
关键词
Blum-Kiefer-Rosenblatt correlation; feature screening; independence test; ranking consistency property; sure screening property; VARIABLE SELECTION; KOLMOGOROV FILTER; ORACLE PROPERTIES; ADDITIVE-MODELS; ADAPTIVE LASSO; REGRESSION;
D O I
10.5705/ss.202016.0264
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper we introduce a modified Blum-Kiefer-Rosenblatt correlation (MBKR for short) to rank the relative importance of each predictor in ultrahigh-dimensional regressions. We advocate using the MBKR for two reasons. First, it is nonnegative and is zero if and only if two random variables are independent, indicating that the MBKR can detect nonlinear dependence. We illustrate that the sure independence screening procedure based on the MBKR (MBKR-SIS for short) is effective in detecting nonlinear effects, including interactions and heterogeneity, particularly when both continuous and discrete predictors are involved. Second, the MBKR is conceptually simple, easy to implement, and affine-invariant. It is free of tuning parameters and no iteration is required in estimation. It remains unchanged when order-preserving transformations are applied to the response or predictors, indicating that the MBKR-SIS is robust to the presence of extreme values and outliers in the observations. We show that, under mild conditions, the MBKR-SIS procedure has the sure screening and ranking consistency properties, guaranteeing that all important predictors can be retained after screening with probability approaching one. We also propose an iterative screening procedure to detect the important predictors that are marginally independent of the response variable. We demonstrate the merits of the MBKR-SIS procedure through simulations and an application to a dataset.
引用
收藏
页码:1351 / 1370
页数:20
相关论文
共 50 条
  • [21] ON MARGINAL SLICED INVERSE REGRESSION FOR ULTRAHIGH DIMENSIONAL MODEL-FREE FEATURE SELECTION
    Yu, Zhou
    Dong, Yuexiao
    Shao, Jun
    [J]. ANNALS OF STATISTICS, 2016, 44 (06): : 2594 - 2623
  • [22] Model-free feature screening for ultrahigh dimensional data via a Pearson chi-square based index
    Ma, Weidong
    Xiao, Jingsong
    Yang, Ying
    Ye, Fei
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2022, 92 (15) : 3222 - 3248
  • [23] Model-free feature screening for high-dimensional survival data
    Lin, Yuanyuan
    Liu, Xianhui
    Hao, Meiling
    [J]. SCIENCE CHINA-MATHEMATICS, 2018, 61 (09) : 1617 - 1636
  • [24] Model-free feature screening for high-dimensional survival data
    Yuanyuan Lin
    Xianhui Liu
    Meiling Hao
    [J]. Science China Mathematics, 2018, 61 : 1617 - 1636
  • [25] Model-free feature screening for high-dimensional survival data
    Yuanyuan Lin
    Xianhui Liu
    Meiling Hao
    [J]. Science China Mathematics, 2018, 61 (09) : 79 - 98
  • [26] A NEW MODEL-FREE FEATURE SCREENING PROCEDURE FOR ULTRAHIGH-DIMENSIONAL INTERVAL-CENSORED FAILURE TIME DATA
    Zhang, Jing
    Du, Mingyue
    Liu, Yanyan
    Sun, Jianguo
    [J]. STATISTICA SINICA, 2023, 33 (03) : 1809 - 1830
  • [27] Survival Impact Index and Ultrahigh-Dimensional Model-Free Screening with Survival Outcomes
    Li, Jialiang
    Zheng, Qi
    Peng, Limin
    Huang, Zhipeng
    [J]. BIOMETRICS, 2016, 72 (04) : 1145 - 1154
  • [28] Model-free survival conditional feature screening
    Chen, Xiaolin
    Liu, Wei
    Chen, Xiaojing
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (10) : 5690 - 5708
  • [29] Robust model-free feature screening based on modified Hoeffding measure for ultra-high dimensional data
    Yu, Yuan
    He, Di
    Zhou, Yong
    [J]. STATISTICS AND ITS INTERFACE, 2018, 11 (03) : 473 - 489
  • [30] Model free feature screening for ultrahigh dimensional data with responses missing at random
    Lai, Peng
    Liu, Yiming
    Liu, Zhi
    Wan, Yi
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 105 : 201 - 216