A New Representation in PSO for Discretization-Based Feature Selection

被引:146
|
作者
Tran, Binh [1 ]
Xue, Bing [1 ]
Zhang, Mengjie [1 ]
机构
[1] Victoria Univ Wellington, Evolutionary Computat Res Grp, Wellington, New Zealand
关键词
Classification; discretization; feature selection (FS); high-dimensional data; particle swarm optimization (PSO); PARTICLE SWARM OPTIMIZATION; CLASSIFICATION; ALGORITHM;
D O I
10.1109/TCYB.2017.2714145
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In machine learning, discretization and feature selection (FS) are important techniques for preprocessing data to improve the performance of an algorithm on high-dimensional data. Since many FS methods require discrete data, a common practice is to apply discretization before FS. In addition, for the sake of efficiency, features are usually discretized individually (or univariate). This scheme works based on the assumption that each feature independently influences the task, which may not hold in cases where feature interactions exist. Therefore, univariate discretization may degrade the performance of the FS stage since information showing feature interactions may be lost during the discretization process. Initial results of our previous proposed method [evolve particle swarm optimization (EPSO)] showed that combining discretization and FS in a single stage using bare-bones particle swarm optimization (BBPSO) can lead to a better performance than applying them in two separate stages. In this paper, we propose a new method called potential particle swarm optimization (PPSO) which employs a new representation that can reduce the search space of the problem and a new fitness function to better evaluate candidate solutions to guide the search. The results on ten high-dimensional datasets show that PPSO select less than 5% of the number of features for all datasets. Compared with the two-stage approach which uses BBPSO for FS on the discretized data, PPSO achieves significantly higher accuracy on seven datasets. In addition, PPSO obtains better (or similar) classification performance than EPSO on eight datasets with a smaller number of selected features on six datasets. Furthermore, PPSO also outperforms the three compared (traditional) methods and performs similar to one method on most datasets in terms of both generalization ability and learning capacity.
引用
收藏
页码:1733 / 1746
页数:14
相关论文
共 50 条
  • [1] Discretization-Based Feature Selection as a Bilevel Optimization Problem
    Said, Rihab
    Elarbi, Maha
    Bechikh, Slim
    Coello Coello, Carlos Artemio
    Said, Lamjed Ben
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (04) : 893 - 907
  • [2] An Improved Discretization-Based Feature Selection via Particle Swarm Optimization
    Lin, Jiping
    Zhou, Yu
    Kang, Junhao
    [J]. KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2019, PT II, 2019, 11776 : 298 - 310
  • [3] Feature subset selection via an improved discretization-based particle swarm optimization
    Zhou, Yu
    Lin, Jiping
    Guo, Hainan
    [J]. APPLIED SOFT COMPUTING, 2021, 98
  • [4] PSO and Statistical Clustering for Feature Selection: A New Representation
    Nguyen, Hoai Bach
    Xue, Bing
    Liu, Ivy
    Zhang, Mengjie
    [J]. SIMULATED EVOLUTION AND LEARNING (SEAL 2014), 2014, 8886 : 569 - 581
  • [5] An evolutionary multi-objective optimization framework of discretization-based feature selection for classification
    Zhou, Yu
    Kang, Junhao
    Kwong, Sam
    Wang, Xu
    Zhang, Qingfu
    [J]. SWARM AND EVOLUTIONARY COMPUTATION, 2021, 60
  • [6] A Cooperative Coevolutionary Approach to Discretization-Based Feature Selection for High-Dimensional Data
    Zhou, Yu
    Kang, Junhao
    Zhang, Xiao
    [J]. ENTROPY, 2020, 22 (06)
  • [7] Feature discretization-based deep clustering for thyroid ultrasound image feature extraction
    Yu, Ruiguo
    Tian, Yuan
    Gao, Jie
    Liu, Zhiqiang
    Wei, Xi
    Jiang, Han
    Huang, Yuxiao
    Li, Xuewei
    [J]. COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 146
  • [8] Discretization-based analysis of structural electrodynamics
    Kang, SJ
    Park, KS
    [J]. KSME INTERNATIONAL JOURNAL, 1999, 13 (11): : 842 - 850
  • [9] A new Algorithm for Data Discretization and Feature Selection
    Ribeiro, Marcela Xavier
    Traina, Agma J. M.
    Traina, Caetano, Jr.
    [J]. APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 953 - 954
  • [10] Feature Selection for SVM Classifiers Based on Discretization
    李烨
    蔡云泽
    许晓鸣
    [J]. Journal of Shanghai Jiaotong University(Science), 2005, (03) : 268 - 273