Group feature screening for ultrahigh-dimensional data missing at random

被引:1
|
作者
He, Hanji [1 ]
Li, Meini [2 ]
Deng, Guangming [3 ,4 ]
机构
[1] South China Univ Technol, Sch Econ & Finance, Guangzhou 510006, Guangdong, Peoples R China
[2] Chongqing Sch Int Business & Econ, Sch Math & Comp Sci, Chongqing 401520, Peoples R China
[3] Guilin Univ Technol, Sch Math & Stat, Guangxi 541000, Peoples R China
[4] Guilin Univ Technol, Appl Stat Inst, Guangxi 541000, Peoples R China
来源
AIMS MATHEMATICS | 2024年 / 9卷 / 02期
基金
中国国家自然科学基金;
关键词
group feature screening; sure screening property; chi-square statistic; missing data; SELECTION;
D O I
10.3934/math.2024197
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Statistical inference for missing data is common in data analysis, and there are still widespread cases of missing data in big data. The literature has discussed the practicability of two stage feature screening with categorical covariates missing at random (IMCSIS). Therefore, we propose group feature screening for ultrahigh dimensional data with categorical covariates missing at random (GIMCSIS), which can be used to effectively select important features. The proposed method expands the scope of IMCSIS and further improves the performance of classification learning when covariates are missing. Based on the adjusted Pearson chi-square statistics, a two-stage group feature screening method is modeled, and theoretical analysis proves that the proposed method conforms to the sure screening property. In a numerical simulation, GIMCSIS can achieve better finite sample performance under binary and multivariate response variables and multi-classification covariates. The empirical analysis through multiple classification results shows that GIMCSIS is superior to IMCSIS in imbalanced data classification.
引用
收藏
页码:4032 / 4056
页数:25
相关论文
共 50 条
  • [1] Feature screening for ultrahigh-dimensional survival data when failure indicators are missing at random
    Fang, Jianglin
    [J]. STATISTICAL PAPERS, 2021, 62 (03) : 1141 - 1166
  • [2] Feature screening for ultrahigh-dimensional survival data when failure indicators are missing at random
    Jianglin Fang
    [J]. Statistical Papers, 2021, 62 : 1141 - 1166
  • [3] Nonparametric independence feature screening for ultrahigh-dimensional missing data
    Fang, Jianglin
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2022, 51 (10) : 5670 - 5689
  • [4] Feature screening in ultrahigh-dimensional partially linear models with missing responses at random
    Tang, Niansheng
    Xia, Linli
    Yan, Xiaodong
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2019, 133 : 208 - 227
  • [5] A nonparametric feature screening method for ultrahigh-dimensional missing response
    Li, Xiaoxia
    Tang, Niansheng
    Xie, Jinhan
    Yan, Xiaodong
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 142
  • [6] Group Feature Screening Based on Information Gain Ratio for Ultrahigh-Dimensional Data
    Wang, Zhongzheng
    Deng, Guangming
    Yu, Jianqi
    [J]. JOURNAL OF MATHEMATICS, 2022, 2022
  • [7] A selective overview of feature screening for ultrahigh-dimensional data
    Liu JingYuan
    Zhong Wei
    Li RunZe
    [J]. SCIENCE CHINA-MATHEMATICS, 2015, 58 (10) : 2033 - 2054
  • [8] A selective overview of feature screening for ultrahigh-dimensional data
    JingYuan Liu
    Wei Zhong
    RunZe Li
    [J]. Science China Mathematics, 2015, 58 : 1 - 22
  • [9] A selective overview of feature screening for ultrahigh-dimensional data
    LIU JingYuan
    ZHONG Wei
    LI RunZe
    [J]. Science China Mathematics, 2015, 58 (10) : 2033 - 2054
  • [10] Feature screening for ultrahigh dimensional categorical data with covariates missing at random
    Ni, Lyu
    Fang, Fang
    Shao, Jun
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 142 (142)