Feature screening for ultrahigh dimensional categorical data with covariates missing at random

被引:9
|
作者
Ni, Lyu [1 ]
Fang, Fang [2 ]
Shao, Jun [2 ,3 ]
机构
[1] East China Normal Univ, Sch Data Sci & Engn, Shanghai, Peoples R China
[2] East China Normal Univ, Sch Stat, Key Lab Adv Theory & Applicat Stat & Data Sci MOE, Shanghai, Peoples R China
[3] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
基金
中国国家自然科学基金;
关键词
Feature screening; Missing at random; Missing covariate; Pearson Chi-Square statistic; Sure screening property; VARIABLE SELECTION; KOLMOGOROV FILTER; MODEL SELECTION; REGRESSION;
D O I
10.1016/j.csda.2019.106824
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Most existing feature screening methods assume that data are fully observed. It is quite a challenge to develop screening methods for incomplete data since the traditional missing data analysis techniques cannot be directly applied to ultrahigh dimensional case. A two-step model-free feature screening procedure for ultrahigh dimensional categorical data when some covariate values are missing at random is developed. For each covariate with missing data, the first step screens out the variables in the unspecified propensity function. In the second step, screening statistics such as the adjusted Pearson Chi-Square statistics can be calculated by leveraging the variables obtained in the first step and the special structure of categorical data. Sure screening properties are established for the proposed method. Finite sample performance is investigated by simulation studies and a real data example. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Feature selection of ultrahigh-dimensional covariates with survival outcomes:a selective review
    HONG Hyokyoung Grace
    LI Yi
    [J]. Applied Mathematics:A Journal of Chinese Universities, 2017, 32 (04) : 379 - 396
  • [42] Bias correction in logistic regression with missing categorical covariates
    Das, Ujjwal
    Maiti, Tapabrata
    Pradhan, Vivek
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2010, 140 (09) : 2478 - 2485
  • [43] Feature selection of ultrahigh-dimensional covariates with survival outcomes: a selective review
    Hyokyoung Grace Hong
    Yi Li
    [J]. Applied Mathematics-A Journal of Chinese Universities, 2017, 32 : 379 - 396
  • [44] Feature space reduction method for ultrahigh-dimensional, multiclass data: random forest-based multiround screening (RFMS)
    Hanczar, Gergely
    Stippinger, Marcell
    Hanak, David
    Kurbucz, Marcell T.
    Torteli, Oliver M.
    Chripko, Agnes
    Somogyvari, Zoltan
    [J]. MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2023, 4 (04):
  • [45] Analysis of nonlinear structural equation models with nonignorable missing covariates and ordered categorical data
    Lee, Sik-Yum
    Tang, Nian-Sheng
    [J]. STATISTICA SINICA, 2006, 16 (04) : 1117 - 1141
  • [46] Robust Screening for Ultrahigh Dimensional Data
    He Xiaoqun
    Ma Xuejun
    Zhang Hui
    [J]. STATISTIC APPLICATION IN MODERN SOCIETY, 2015, : 769 - 772
  • [47] Group Feature Screening Based on Information Gain Ratio for Ultrahigh-Dimensional Data
    Wang, Zhongzheng
    Deng, Guangming
    Yu, Jianqi
    [J]. JOURNAL OF MATHEMATICS, 2022, 2022
  • [48] A Robust Model-Free Feature Screening Method for Ultrahigh-Dimensional Data
    Xue, Jingnan
    Liang, Faming
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2017, 26 (04) : 803 - 813
  • [49] Sure independence screening for ultrahigh dimensional feature space
    Fan, Jianqing
    Lv, Jinchi
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 : 849 - 883
  • [50] The Sparse MLE for Ultrahigh-Dimensional Feature Screening
    Xu, Chen
    Chen, Jiahua
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (507) : 1257 - 1269