Covariate Information Number for Feature Screening in Ultrahigh-Dimensional Supervised Problems

被引:1
|
作者
Nandy, Debmalya [1 ]
Chiaromonte, Francesca [2 ,3 ,4 ]
Li, Runze [2 ]
机构
[1] Univ Colorado, Dept Biostat & Informat, Colorado Sch Publ Hlth, Anschutz Med Campus, Aurora, CO 80045 USA
[2] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[3] St Anna Sch Adv Studies, Inst Econ, Pisa, Italy
[4] St Anna Sch Adv Studies, EMbeDS, Pisa, Italy
关键词
Affymetrix GeneChip Rat Genome 230 2.0 Array; Fisher information; Model-free; Supervised problems; Sure independence screening; Ultrahigh dimension;
D O I
10.1080/01621459.2020.1864380
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Contemporary high-throughput experimental and surveying techniques give rise to ultrahigh-dimensional supervised problems with sparse signals; that is, a limited number of observations (n), each with a very large number of covariates (p >> n), only a small share of which is truly associated with the response. In these settings, major concerns on computational burden, algorithmic stability, and statistical accuracy call for substantially reducing the feature space by eliminating redundant covariates before the use of any sophisticated statistical analysis. Along the lines of Pearson's correlation coefficient-based sure independence screening and other model- and correlation-based feature screening methods, we propose a model-free procedure called covariate information number-sure independence screening (CIS). CIS uses a marginal utility connected to the notion of the traditional Fisher information, possesses the sure screening property, and is applicable to any type of response (features) with continuous features (response). Simulations and an application to transcriptomic data on rats reveal the comparative strengths of CIS over some popular feature screening methods. for this article are available online.
引用
收藏
页码:1516 / 1529
页数:14
相关论文
共 50 条
  • [21] Efficient feature screening for ultrahigh-dimensional varying coefficient models
    Chen, Xin
    Ma, Xuejun
    Wang, Xueqin
    Zhang, Jingxiao
    STATISTICS AND ITS INTERFACE, 2017, 10 (03) : 407 - 412
  • [22] Nonparametric independence feature screening for ultrahigh-dimensional missing data
    Fang, Jianglin
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (10) : 5670 - 5689
  • [23] Feature Screening for Nonparametric and Semiparametric Models with Ultrahigh-Dimensional Covariates
    ZHANG Junying
    ZHANG Riquan
    ZHANG Jiajia
    Journal of Systems Science & Complexity, 2018, 31 (05) : 1350 - 1361
  • [24] Feature screening for ultrahigh-dimensional binary classification via linear projection
    Lai, Peng
    Wang, Mingyue
    Song, Fengli
    Zhou, Yanqiu
    AIMS MATHEMATICS, 2023, 8 (06): : 14270 - 14287
  • [25] Feature screening in ultrahigh-dimensional varying-coefficient Cox model
    Yang, Guangren
    Zhang, Ling
    Li, Runze
    Huang, Yuan
    JOURNAL OF MULTIVARIATE ANALYSIS, 2019, 171 : 284 - 297
  • [26] Unified mean-variance feature screening for ultrahigh-dimensional regression
    Liming Wang
    Xingxiang Li
    Xiaoqing Wang
    Peng Lai
    Computational Statistics, 2022, 37 : 1887 - 1918
  • [27] FEATURE SCREENING IN ULTRAHIGH-DIMENSIONAL GENERALIZED VARYING-COEFFICIENT MODELS
    Yang, Guangren
    Yang, Songshan
    Li, Runze
    STATISTICA SINICA, 2020, 30 (02) : 1049 - 1067
  • [28] Unified mean-variance feature screening for ultrahigh-dimensional regression
    Wang, Liming
    Li, Xingxiang
    Wang, Xiaoqing
    Lai, Peng
    COMPUTATIONAL STATISTICS, 2022, 37 (04) : 1887 - 1918
  • [29] Fast robust feature screening for ultrahigh-dimensional varying coefficient models
    Ma, Xuejun
    Chen, Xin
    Zhang, Jingxiao
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2017, 87 (04) : 724 - 732
  • [30] Interaction Screening for Ultrahigh-Dimensional Data
    Hao, Ning
    Zhang, Hao Helen
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (507) : 1285 - 1301