Partition-based feature screening for categorical data via RKHS embeddings

被引:2
|
作者
Lu, Jun [1 ]
Lin, Lu [2 ,3 ]
Wang, WenWu [3 ]
机构
[1] Natl Univ Def Technol, Coll Liberal Arts & Sci, Changsha, Peoples R China
[2] Shandong Technol & Business Univ, Sch Stat, Yantai, Peoples R China
[3] Qufu Normal Univ, Sch Stat, Qufu, Shandong, Peoples R China
基金
国家重点研发计划;
关键词
Feature screening; Partition-based; Categorical data; RKHS; VARYING COEFFICIENT MODELS;
D O I
10.1016/j.csda.2021.107176
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper proposes a new screening procedure for the ultrahigh dimensional data with a categorical response. By exploiting the group structure among predictors, a new partition-based screening approach is developed via the reproducing kernel Hilbert space (RKHS) embeddings in the maximum mean discrepancy framework. Consequently, the new method is able to identify the influential group of predictors that may be overlooked by the marginal screening methods. Moreover, by using the RKHS embedding, the new ranking index has a very simple form, and thus can be evaluated easily. As a byproduct, the new method is model-free without specifying any relationship between the predictors and the response. The sure screening property of the proposed method is proved and the effectiveness of the new method is also illustrated via numerical studies and a real data analysis. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] A PARTITION-BASED FEATURE SELECTION METHOD FOR MIXED DATA: A FILTER APPROACH
    Dutt, Ashish
    Ismail, Maizatul Akmar
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2020, 33 (02) : 152 - 169
  • [2] Partition-based ultrahigh-dimensional variable screening
    Kang, Jian
    Hong, Hyokyoung G.
    Li, Yi
    [J]. BIOMETRIKA, 2017, 104 (04) : 785 - 800
  • [3] Dynamically adaptive partition-based data distribution management
    Kumova, BI
    [J]. Workshop on Principles of Advanced and Distributed Simulation, Proceedings, 2005, : 292 - 300
  • [4] Distributed partition-based optimization via dual decomposition
    Carli, Ruggero
    Notarstefano, Giuseppe
    [J]. 2013 IEEE 52ND ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2013, : 2979 - 2984
  • [5] Partition-Based Clustering with Sliding Windows for Data Streams
    Youn, Jonghem
    Choi, Jihun
    Shim, Junho
    Lee, Sang-goo
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2017), PT II, 2017, 10178 : 289 - 303
  • [6] PARTITION-BASED CLOUD DATA STORAGE AND PROCESSING MODEL
    Zhao, Yawei
    Wang, Yong
    [J]. 2012 IEEE 2ND INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENT SYSTEMS (CCIS) VOLS 1-3, 2012, : 218 - 223
  • [7] Redundant Via Allocation for Layer Partition-based Redundant Via Insertion
    Shen, Jian-Wei
    Chiang, Mei-Fang
    Chen, Song
    Guo, Wei
    Yoshimura, Takeshi
    [J]. 2009 IEEE 8TH INTERNATIONAL CONFERENCE ON ASIC, VOLS 1 AND 2, PROCEEDINGS, 2009, : 734 - 737
  • [8] Fuzzy rough dimensionality reduction: A feature set partition-based approach
    Wang, Zhihong
    Chen, Hongmei
    Yang, Xiaoling
    Wan, Jihong
    Li, Tianrui
    Luo, Chuan
    [J]. INFORMATION SCIENCES, 2023, 644
  • [9] Feature Screening for Ultrahigh Dimensional Categorical Data With Applications
    Huang, Danyang
    Li, Runze
    Wang, Hansheng
    [J]. JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2014, 32 (02) : 237 - 244
  • [10] Partition-based workload scheduling in living data warehouse environments
    Thiele, Maik
    Fischer, Ulrike
    Lehner, Wolfgang
    [J]. INFORMATION SYSTEMS, 2009, 34 (4-5) : 382 - 399