Robust model-free feature screening based on modified Hoeffding measure for ultra-high dimensional data

被引:1
|
作者
Yu, Yuan [1 ,2 ]
He, Di [2 ]
Zhou, Yong [3 ,4 ]
机构
[1] Shandong Univ Finance & Econ, Sch Stat, Jinan 250014, Shandong, Peoples R China
[2] Shanghai Univ Finance & Econ, Sch Stat & Management, Shanghai 200433, Peoples R China
[3] East China Normal Univ, Fac Econ & Management, Inst Stat & Interdisciplinary Sci, Shanghai 200241, Peoples R China
[4] East China Normal Univ, Fac Econ & Management, Sch Stat, Shanghai 200241, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature screening; Hoeffding measure; Ranking consistency property; Robustness; Sure screening property; Ultrahigh-dimensional data; VARYING COEFFICIENT MODELS; GENE-EXPRESSION; FEATURE-SELECTION; CLASSIFICATION; CANCER;
D O I
10.4310/SII.2018.v11.n3.a10
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Sure independence screening (SIS) has become a cutting-edge dimension reduction technique to extract important features from ultrahigh-dimensional data in statistical learning. Many of the screening methods are developed to be suitable for special models that follow certain assumptions. With the availability of more data types and complicated models, a robust model-free procedure with less restrictive conditions of data is required. In this paper, we propose a modified Hoeffding measure which efficiently characterize the dependence between two random variables. The modified Hoeffding measure is between 0 and 1, and zero if and only if the two variables are independent under some mild conditions. This property enables us to propose a novel feature screening procedure based on it without specifying the regression structure. The proposed method is robust for both the predictors and response with the heavy-tailed data and outliers, and suitable for complex data including discrete and multivariate variables. In addition, it can extract important features even when the underlying model is complicated. We further establish the sure screening property and ranking consistency property even when the dimensionality is an exponential order of the sample size without assuming any moment condition on the predictors and response. Simulations and an analysis of real data demonstrate the versatility and practicability of the proposed method in comparison with other state-of-the-art approaches.
引用
收藏
页码:473 / 489
页数:17
相关论文
共 50 条
  • [1] Model-free feature screening for ultra-high dimensional competing risks data
    Chen, Xiaolin
    Zhang, Yahui
    Liu, Yi
    Chen, Xiaojing
    [J]. STATISTICS & PROBABILITY LETTERS, 2020, 164
  • [2] A new robust model-free feature screening method for ultra-high dimensional right censored data
    Liu, Yi
    Chen, Xiaolin
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2022, 51 (06) : 1857 - 1875
  • [3] Model-free conditional feature screening for ultra-high dimensional right censored data
    Chen, Xiaolin
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (12) : 2425 - 2446
  • [4] A generic model-free feature screening procedure for ultra-high dimensional data with categorical response
    Cheng, Xuewei
    Wang, Hong
    [J]. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2023, 229
  • [5] Joint model-free feature screening for ultra-high dimensional semi-competing risks data
    Lu, Shuiyun
    Chen, Xiaolin
    Xu, Sheng
    Liu, Chunling
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 147
  • [6] Robust model-free feature screening for ultrahigh dimensional surrogate data
    Lai, Peng
    Chen, Yuanxing
    Zhang, Jie
    Dai, Bingying
    Zhang, Qingzhao
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2020, 90 (03) : 550 - 569
  • [7] A Robust Model-Free Feature Screening Method for Ultrahigh-Dimensional Data
    Xue, Jingnan
    Liang, Faming
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2017, 26 (04) : 803 - 813
  • [8] Model-free feature screening for high-dimensional survival data
    Lin, Yuanyuan
    Liu, Xianhui
    Hao, Meiling
    [J]. SCIENCE CHINA-MATHEMATICS, 2018, 61 (09) : 1617 - 1636
  • [9] Model-free feature screening for high-dimensional survival data
    Yuanyuan Lin
    Xianhui Liu
    Meiling Hao
    [J]. Science China Mathematics, 2018, 61 : 1617 - 1636
  • [10] Grouped feature screening for ultra-high dimensional data for the classification model
    He, Hanji
    Deng, Guangming
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2022, 92 (05) : 974 - 997