Clustering-Guided Particle Swarm Feature Selection Algorithm for High-Dimensional Imbalanced Data With Missing Values

被引:51
|
作者
Zhang, Yong [1 ]
Wang, Yan-Hu [1 ]
Gong, Dun-Wei [1 ]
Sun, Xiao-Yan [1 ]
机构
[1] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Class imbalance; feature selection (FS); fuzzy clustering; missing value; particle swarm optimization (PSO); SENSITIVE FEATURE-SELECTION; MUTUAL INFORMATION; DIFFERENTIAL EVOLUTION; GENETIC ALGORITHM; OPTIMIZATION; CLASSIFICATION; MACHINE;
D O I
10.1109/TEVC.2021.3106975
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection (FS) in data with class imbalance or missing values has received much attention from researchers due to their universality in real-world applications. However, for data with both the two characteristics above, there is still a lack of the corresponding FS algorithm. Due to the complex coupling relationship between missing data and class imbalance, the need for better FS method becomes essential. To tackle high-dimensional imbalanced data with missing values, this article studies a new evolutionary FS method. First, an improved F-measure based on filling risk (RF-measure) is defined to evaluate the influence of missing data on the performance of FS in the case of class imbalance. Following that taking the RF-measure as an objective function, a particle swarm optimization-based FS method with fuzzy clustering (PSOFS-FC) is proposed. Two new problem-specific operators or strategies, i.e., the swarm initialization strategy guided by fuzzy clustering and the local pruning operator based on feature importance, are developed to improve the performance of PSOFS-FC. Compared with state-of-the-art FS algorithms on several public datasets, experimental results show that PSOFS-FC can achieve excellent classification performance with relatively less running time, indicating its superiority on tackling high-dimensional imbalanced data with missing values.
引用
收藏
页码:616 / 630
页数:15
相关论文
共 50 条
  • [31] Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud
    Fanyu Bu
    Zhikui Chen
    Qingchen Zhang
    Laurence T. Yang
    The Journal of Supercomputing, 2016, 72 : 2977 - 2990
  • [32] A Fast Hybrid Feature Selection Method Based on Dynamic Clustering and Improved Particle Swarm Optimization for High-Dimensional Health Care Data
    Kang, Yan
    Peng, Luhan
    Guo, Jing
    Lu, Yuhuan
    Yang, Yun
    Fan, Baochen
    Pu, Bin
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 2447 - 2459
  • [33] Implementation of FAST Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data
    Shilu, Smit
    Sheth, Kushal
    Mehul, Ekata
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ICT FOR SUSTAINABLE DEVELOPMENT ICT4SD 2015, VOL 2, 2016, 409 : 203 - 213
  • [34] Incomplete high-dimensional data imputation algorithm using feature selection and clustering analysis on cloud
    Bu, Fanyu
    Chen, Zhikui
    Zhang, Qingchen
    Yang, Laurence T.
    JOURNAL OF SUPERCOMPUTING, 2016, 72 (08): : 2977 - 2990
  • [35] Variable-Size Cooperative Coevolutionary Particle Swarm Optimization for Feature Selection on High-Dimensional Data
    Song, Xian-Fang
    Zhang, Yong
    Guo, Yi-Nan
    Sun, Xiao-Yan
    Wang, Yong-Li
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2020, 24 (05) : 882 - 895
  • [36] A GA-based Feature Selection for High-dimensional Data Clustering
    Sun, Mei
    Xiong, Langhuan
    Sun, Haojun
    Jiang, Dazhi
    THIRD INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING, 2009, : 769 - 772
  • [37] Heterogeneous cognitive learning chameleon swarm algorithm for high-dimensional feature selection
    Malik Braik
    Mohammed A. Awadallah
    Hussein Alzoubi
    Heba Al-Hiary
    The Journal of Supercomputing, 81 (5)
  • [38] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [39] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    Computational Management Science, 2009, 6 (1) : 25 - 40
  • [40] Research of Medical High-dimensional Imbalanced Data Classification-Ensemble Feature Selection Algorithm with Random Forest
    Zhu, Min
    Su, Bo
    Ning, Gangmin
    2017 INTERNATIONAL CONFERENCE ON SMART GRID AND ELECTRICAL AUTOMATION (ICSGEA), 2017, : 273 - 277