Clustering-Guided Particle Swarm Feature Selection Algorithm for High-Dimensional Imbalanced Data With Missing Values

被引:51
|
作者
Zhang, Yong [1 ]
Wang, Yan-Hu [1 ]
Gong, Dun-Wei [1 ]
Sun, Xiao-Yan [1 ]
机构
[1] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Class imbalance; feature selection (FS); fuzzy clustering; missing value; particle swarm optimization (PSO); SENSITIVE FEATURE-SELECTION; MUTUAL INFORMATION; DIFFERENTIAL EVOLUTION; GENETIC ALGORITHM; OPTIMIZATION; CLASSIFICATION; MACHINE;
D O I
10.1109/TEVC.2021.3106975
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection (FS) in data with class imbalance or missing values has received much attention from researchers due to their universality in real-world applications. However, for data with both the two characteristics above, there is still a lack of the corresponding FS algorithm. Due to the complex coupling relationship between missing data and class imbalance, the need for better FS method becomes essential. To tackle high-dimensional imbalanced data with missing values, this article studies a new evolutionary FS method. First, an improved F-measure based on filling risk (RF-measure) is defined to evaluate the influence of missing data on the performance of FS in the case of class imbalance. Following that taking the RF-measure as an objective function, a particle swarm optimization-based FS method with fuzzy clustering (PSOFS-FC) is proposed. Two new problem-specific operators or strategies, i.e., the swarm initialization strategy guided by fuzzy clustering and the local pruning operator based on feature importance, are developed to improve the performance of PSOFS-FC. Compared with state-of-the-art FS algorithms on several public datasets, experimental results show that PSOFS-FC can achieve excellent classification performance with relatively less running time, indicating its superiority on tackling high-dimensional imbalanced data with missing values.
引用
收藏
页码:616 / 630
页数:15
相关论文
共 50 条
  • [1] A Clustering-Guided Integer Brain Storm Optimizer for Feature Selection in High-Dimensional Data
    Jia Yun-Tao
    Zhang Wan-Qiu
    He Chun-Lin
    [J]. DISCRETE DYNAMICS IN NATURE AND SOCIETY, 2021, 2021
  • [2] Particle Swarm Optimisation for Feature Selection and Weighting in High-Dimensional Clustering
    O'Neill, Damien
    Lensen, Andrew
    Xue, Bing
    Zhang, Mengjie
    [J]. 2018 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2018, : 173 - 180
  • [3] A Fast Hybrid Feature Selection Based on Correlation-Guided Clustering and Particle Swarm Optimization for High-Dimensional Data
    Song, Xian-Fang
    Zhang, Yong
    Gong, Dun-Wei
    Gao, Xiao-Zhi
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9573 - 9586
  • [4] Feature selection for high-dimensional imbalanced data
    Yin, Liuzhi
    Ge, Yong
    Xiao, Keli
    Wang, Xuehua
    Quan, Xiaojun
    [J]. NEUROCOMPUTING, 2013, 105 : 3 - 11
  • [5] Feature Selection with High-Dimensional Imbalanced Data
    Van Hulse, Jason
    Khoshgoftaar, Taghi M.
    Napolitano, Amri
    Wald, Randall
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2009), 2009, : 507 - 514
  • [6] CGUFS: A clustering-guided unsupervised feature selection algorithm for gene expression data
    Xu, Zhaozhao
    Yang, Fangyuan
    Wang, Hong
    Sun, Junding
    Zhu, Hengde
    Wang, Shuihua
    Zhang, Yudong
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2023, 35 (09)
  • [7] Revisiting the Problem of Missing Values in High-Dimensional Data and Feature Selection Effect
    Elia, Marina G.
    Duan, Wenting
    [J]. ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, PT I, AIAI 2024, 2024, 711 : 201 - 213
  • [8] Extended particle swarm optimization for feature selection of high-dimensional biomedical data
    Al-Shammary, Dhiah
    Albukhnefis, Adil L.
    Alsaeedi, Ali Hakem
    Al-Asfoor, Muntasir
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (10):
  • [9] A particle swarm optimization based multiobjective memetic algorithm for high-dimensional feature selection
    Juanjuan Luo
    Dongqing Zhou
    Lingling Jiang
    Huadong Ma
    [J]. Memetic Computing, 2022, 14 : 77 - 93
  • [10] A particle swarm optimization based multiobjective memetic algorithm for high-dimensional feature selection
    Luo, Juanjuan
    Zhou, Dongqing
    Jiang, Lingling
    Ma, Huadong
    [J]. MEMETIC COMPUTING, 2022, 14 (01) : 77 - 93