MODEL-FREE FEATURE SCREENING FOR ULTRAHIGH DIMENSIONAL DATATHROUGH A MODIFIED BLUM-KIEFER-ROSENBLATT CORRELATION

被引:17
|
作者
Zhou, Yeqing [1 ]
Zhu, Liping [2 ]
机构
[1] Shanghai Univ Finance & Econ, Sch Stat & Management, Shanghai 200433, Peoples R China
[2] Renmin Univ China, Ctr Appl Stat, Inst Stat & Big Data, Beijing 100872, Peoples R China
基金
中国国家自然科学基金;
关键词
Blum-Kiefer-Rosenblatt correlation; feature screening; independence test; ranking consistency property; sure screening property; VARIABLE SELECTION; KOLMOGOROV FILTER; ORACLE PROPERTIES; ADDITIVE-MODELS; ADAPTIVE LASSO; REGRESSION;
D O I
10.5705/ss.202016.0264
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper we introduce a modified Blum-Kiefer-Rosenblatt correlation (MBKR for short) to rank the relative importance of each predictor in ultrahigh-dimensional regressions. We advocate using the MBKR for two reasons. First, it is nonnegative and is zero if and only if two random variables are independent, indicating that the MBKR can detect nonlinear dependence. We illustrate that the sure independence screening procedure based on the MBKR (MBKR-SIS for short) is effective in detecting nonlinear effects, including interactions and heterogeneity, particularly when both continuous and discrete predictors are involved. Second, the MBKR is conceptually simple, easy to implement, and affine-invariant. It is free of tuning parameters and no iteration is required in estimation. It remains unchanged when order-preserving transformations are applied to the response or predictors, indicating that the MBKR-SIS is robust to the presence of extreme values and outliers in the observations. We show that, under mild conditions, the MBKR-SIS procedure has the sure screening and ranking consistency properties, guaranteeing that all important predictors can be retained after screening with probability approaching one. We also propose an iterative screening procedure to detect the important predictors that are marginally independent of the response variable. We demonstrate the merits of the MBKR-SIS procedure through simulations and an application to a dataset.
引用
收藏
页码:1351 / 1370
页数:20
相关论文
共 50 条
  • [1] Model-free feature screening for ultrahigh dimensional classification
    Sheng, Ying
    Wang, Qihua
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2020, 178
  • [2] Model-free feature screening via distance correlation for ultrahigh dimensional survival data
    Zhang, Jing
    Liu, Yanyan
    Cui, Hengjian
    [J]. STATISTICAL PAPERS, 2021, 62 (06) : 2711 - 2738
  • [3] Model-free feature screening via distance correlation for ultrahigh dimensional survival data
    Jing Zhang
    Yanyan Liu
    Hengjian Cui
    [J]. Statistical Papers, 2021, 62 : 2711 - 2738
  • [4] Model-free feature screening for ultrahigh dimensional censored regression
    Tingyou Zhou
    Liping Zhu
    [J]. Statistics and Computing, 2017, 27 : 947 - 961
  • [5] Model-Free Feature Screening for Ultrahigh-Dimensional Data
    Zhu, Li-Ping
    Li, Lexin
    Li, Runze
    Zhu, Li-Xing
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (496) : 1464 - 1475
  • [6] Model-free feature screening for ultrahigh dimensional censored regression
    Zhou, Tingyou
    Zhu, Liping
    [J]. STATISTICS AND COMPUTING, 2017, 27 (04) : 947 - 961
  • [7] Model-free conditional independence feature screening for ultrahigh dimensional data
    Wang LuHeng
    Liu JingYuan
    Li Yong
    Li RunZe
    [J]. SCIENCE CHINA-MATHEMATICS, 2017, 60 (03) : 551 - 568
  • [8] Model-free conditional independence feature screening for ultrahigh dimensional data
    LuHeng Wang
    JingYuan Liu
    Yong Li
    RunZe Li
    [J]. Science China Mathematics, 2017, 60 : 551 - 568
  • [9] Model-free conditional independence feature screening for ultrahigh dimensional data
    WANG Lu Heng
    LIU Jing Yuan
    LI Yong
    LI Run Ze
    [J]. Science China Mathematics, 2017, 60 (03) : 551 - 568
  • [10] Robust model-free feature screening for ultrahigh dimensional surrogate data
    Lai, Peng
    Chen, Yuanxing
    Zhang, Jie
    Dai, Bingying
    Zhang, Qingzhao
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2020, 90 (03) : 550 - 569