Feature Screening for Ultrahigh Dimensional Categorical Data With Applications

被引:59
|
作者
Huang, Danyang [1 ]
Li, Runze [2 ,3 ]
Wang, Hansheng [1 ]
机构
[1] Peking Univ, Guanghua Sch Management, Dept Business Stat & Econometr, Beijing 100871, Peoples R China
[2] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[3] Penn State Univ, Methodol Ctr, University Pk, PA 16802 USA
基金
中国国家自然科学基金;
关键词
Pearson's chi-square test; Screening consistency; Search engine marketing; Text classification; MODEL; REGRESSION;
D O I
10.1080/07350015.2013.863158
中图分类号
F [经济];
学科分类号
02 ;
摘要
Ultrahigh dimensional data with both categorical responses and categorical covariates are frequently encountered in the analysis of big data, for which feature screening has become an indispensable statistical tool. We propose a Pearson chi-square based feature screening procedure for categorical response with ultrahigh dimensional,categorical covariates. The proposed procedure can be directly applied for detection of important interaction effects. We further show that the proposed procedure possesses screening consistency property in the terminology of Fan and Lv (2008). We investigate the finite sample performance of the proposed procedure by Monte Carlo simulation studies and illustrate the proposed method by two empirical datasets.
引用
收藏
页码:237 / 244
页数:8
相关论文
共 50 条
  • [1] Feature screening for ultrahigh dimensional categorical data with covariates missing at random
    Ni, Lyu
    Fang, Fang
    Shao, Jun
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 142 (142)
  • [2] Stable feature screening for ultrahigh dimensional data
    Lai, Peng
    Song, Fengli
    Gao, Yufei
    [J]. JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2019, 48 (02) : 221 - 232
  • [3] Stable feature screening for ultrahigh dimensional data
    Peng Lai
    Fengli Song
    Yufei Gao
    [J]. Journal of the Korean Statistical Society, 2019, 48 : 221 - 232
  • [4] Feature screening for ultrahigh dimensional binary data
    Guan, Guoyu
    Shan, Na
    Guo, Jianhua
    [J]. STATISTICS AND ITS INTERFACE, 2018, 11 (01) : 41 - 50
  • [5] A selective overview of feature screening for ultrahigh-dimensional data
    Liu JingYuan
    Zhong Wei
    Li RunZe
    [J]. SCIENCE CHINA-MATHEMATICS, 2015, 58 (10) : 2033 - 2054
  • [6] A selective overview of feature screening for ultrahigh-dimensional data
    JingYuan Liu
    Wei Zhong
    RunZe Li
    [J]. Science China Mathematics, 2015, 58 : 1 - 22
  • [7] A selective overview of feature screening for ultrahigh-dimensional data
    LIU JingYuan
    ZHONG Wei
    LI RunZe
    [J]. Science China Mathematics, 2015, 58 (10) : 2033 - 2054
  • [8] Model-Free Feature Screening for Ultrahigh-Dimensional Data
    Zhu, Li-Ping
    Li, Lexin
    Li, Runze
    Zhu, Li-Xing
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (496) : 1464 - 1475
  • [9] Group feature screening for ultrahigh-dimensional data missing at random
    He, Hanji
    Li, Meini
    Deng, Guangming
    [J]. AIMS MATHEMATICS, 2024, 9 (02): : 4032 - 4056
  • [10] Feature screening of quadratic inference functions for ultrahigh dimensional longitudinal data
    Lai, Peng
    Liang, Weijuan
    Wang, Fangjian
    Zhang, Qingzhao
    [J]. JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2020, 90 (14) : 2614 - 2630