Feature screening for ultrahigh dimensional categorical data with covariates missing at random

被引:9
|
作者
Ni, Lyu [1 ]
Fang, Fang [2 ]
Shao, Jun [2 ,3 ]
机构
[1] East China Normal Univ, Sch Data Sci & Engn, Shanghai, Peoples R China
[2] East China Normal Univ, Sch Stat, Key Lab Adv Theory & Applicat Stat & Data Sci MOE, Shanghai, Peoples R China
[3] Univ Wisconsin, Dept Stat, Madison, WI 53706 USA
基金
中国国家自然科学基金;
关键词
Feature screening; Missing at random; Missing covariate; Pearson Chi-Square statistic; Sure screening property; VARIABLE SELECTION; KOLMOGOROV FILTER; MODEL SELECTION; REGRESSION;
D O I
10.1016/j.csda.2019.106824
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Most existing feature screening methods assume that data are fully observed. It is quite a challenge to develop screening methods for incomplete data since the traditional missing data analysis techniques cannot be directly applied to ultrahigh dimensional case. A two-step model-free feature screening procedure for ultrahigh dimensional categorical data when some covariate values are missing at random is developed. For each covariate with missing data, the first step screens out the variables in the unspecified propensity function. In the second step, screening statistics such as the adjusted Pearson Chi-Square statistics can be calculated by leveraging the variables obtained in the first step and the special structure of categorical data. Sure screening properties are established for the proposed method. Finite sample performance is investigated by simulation studies and a real data example. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Group feature screening for ultrahigh-dimensional data missing at random
    He, Hanji
    Li, Meini
    Deng, Guangming
    [J]. AIMS MATHEMATICS, 2024, 9 (02): : 4032 - 4056
  • [2] Feature Screening for Ultrahigh Dimensional Categorical Data With Applications
    Huang, Danyang
    Li, Runze
    Wang, Hansheng
    [J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2014, 32 (02) : 237 - 244
  • [3] Model free feature screening for ultrahigh dimensional data with responses missing at random
    Lai, Peng
    Liu, Yiming
    Liu, Zhi
    Wan, Yi
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2017, 105 : 201 - 216
  • [4] FEATURE SCREENING VIA DISTANCE CORRELATION FOR ULTRAHIGH DIMENSIONAL DATA WITH RESPONSES MISSING AT RANDOM
    Xia, Linli
    Tang, Niansheng
    [J]. STATISTICA SINICA, 2023, 33 : 1169 - 1191
  • [5] Feature screening for ultrahigh-dimensional survival data when failure indicators are missing at random
    Fang, Jianglin
    [J]. STATISTICAL PAPERS, 2021, 62 (03) : 1141 - 1166
  • [6] Feature screening for ultrahigh-dimensional survival data when failure indicators are missing at random
    Jianglin Fang
    [J]. Statistical Papers, 2021, 62 : 1141 - 1166
  • [7] Nonparametric independence feature screening for ultrahigh-dimensional missing data
    Fang, Jianglin
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (10) : 5670 - 5689
  • [8] Feature screening in ultrahigh-dimensional partially linear models with missing responses at random
    Tang, Niansheng
    Xia, Linli
    Yan, Xiaodong
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2019, 133 : 208 - 227
  • [9] Feature Screening for Nonparametric and Semiparametric Models with Ultrahigh-Dimensional Covariates
    ZHANG Junying
    ZHANG Riquan
    ZHANG Jiajia
    [J]. Journal of Systems Science & Complexity, 2018, 31 (05) : 1350 - 1361
  • [10] Feature Screening for Nonparametric and Semiparametric Models with Ultrahigh-Dimensional Covariates
    Junying Zhang
    Riquan Zhang
    Jiajia Zhang
    [J]. Journal of Systems Science and Complexity, 2018, 31 : 1350 - 1361