Discretization-Based Feature Selection as a Bilevel Optimization Problem

被引:5
|
作者
Said, Rihab [1 ]
Elarbi, Maha [1 ]
Bechikh, Slim [1 ]
Coello Coello, Carlos Artemio [2 ,3 ,4 ]
Said, Lamjed Ben [1 ]
机构
[1] Univ Tunis, Strategies Modeling & Artificial Intelligence Lab, ISG, Tunis 2000, Tunisia
[2] CINVESTAV IPN, Dept Comp Sci, Evolutionary Computat Grp, Mexico City 07300, Mexico
[3] Basque Ctr Appl Math, Bilbao 48009, Spain
[4] Ikerbasque, Bilbao 48009, Spain
关键词
Bilevel optimization; co-evolutionary algorithm; cut-points search; discretization-based feature selection (DBFS); features interactions; SEARCH;
D O I
10.1109/TEVC.2022.3192113
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Discretization-based feature selection (DBFS) approaches have shown interesting results when using several metaheuristic algorithms, such as particle swarm optimization (PSO), genetic algorithm (GA), ant colony optimization (ACO), etc. However, these methods share the same shortcoming which consists in encoding the problem solution as a sequence of cut-points. From this cut-points vector, the decision of deleting or selecting any feature is induced. Indeed, the number of generated cut-points varies from one feature to another. Thus, the higher the number of cut-points, the higher the probability of selecting the considered feature; and vice versa. This fact leads to the deletion of possibly important features having a single or a low number of cut-points, such as the infection rate, the glycemia level, and the blood pressure. In order to solve the issue of the dependency relation between the feature selection (or removal) event and the number of its generated potential cut-points, we propose to model the DBFS task as a bilevel optimization problem and then solve it using an improved version of an existing co-evolutionary algorithm, named I-CEMBA. The latter ensures the variation of the number of features during the migration process in order to deal with the multimodality aspect. The resulting algorithm, termed bilevel discretization-based feature selection (Bi-DFS), performs selection at the upper level while discretization is done at the lower level. The experimental results on several high-dimensional datasets show that Bi-DFS outperforms relevant state-of-the-art methods in terms of classification accuracy, generalization ability, and feature selection bias.
引用
收藏
页码:893 / 907
页数:15
相关论文
共 50 条
  • [21] A MATCHING ESTIMATOR BASED ON A BILEVEL OPTIMIZATION PROBLEM
    Diaz, Juan
    Rau, Tomas
    Rivera, Jorge
    [J]. REVIEW OF ECONOMICS AND STATISTICS, 2015, 97 (04) : 803 - 812
  • [22] Seismic Slope Stability with Discretization-Based Kinematic Analysis
    Chian, Siau Chen
    Qin, Changbing
    [J]. GEOTECHNICAL EARTHQUAKE ENGINEERING AND SPECIAL TOPICS (GEO-CONGRESS 2020), 2020, (318): : 284 - 294
  • [23] Simultaneous feature selection and discretization based on mutual information
    Sharmin, Sadia
    Shoyaib, Mohammad
    Ali, Amin Ahsan
    Khan, Muhammad Asif Hossain
    Chae, Oksam
    [J]. PATTERN RECOGNITION, 2019, 91 : 162 - 174
  • [24] Tightening discretization-based MILP models for the pooling problem using upper bounds on bilinear terms
    Chen, Yifu
    Maravelias, Christos T.
    Zhang, Xiaomin
    [J]. OPTIMIZATION LETTERS, 2024, 18 (01) : 215 - 234
  • [25] Feature selection via discretization
    Liu, H
    Setiono, R
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 1997, 9 (04) : 642 - 645
  • [26] Tightening discretization-based MILP models for the pooling problem using upper bounds on bilinear terms
    Yifu Chen
    Christos T. Maravelias
    Xiaomin Zhang
    [J]. Optimization Letters, 2024, 18 : 215 - 234
  • [27] FEATURE DISCRETIZATION AND SELECTION IN MICROARRAY DATA
    Ferreira, Artur
    Figueiredo, Mario
    [J]. KDIR 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2011, : 465 - 469
  • [28] Unsupervised Joint Feature Discretization and Selection
    Ferreira, Artur
    Figueiredo, Mario
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS: 5TH IBERIAN CONFERENCE, IBPRIA 2011, 2011, 6669 : 200 - 207
  • [29] An unsupervised approach to feature discretization and selection
    Ferreira, Artur J.
    Figueiredo, Mario A. T.
    [J]. PATTERN RECOGNITION, 2012, 45 (09) : 3048 - 3060
  • [30] The drone scheduling problem in shore-to-ship delivery: A time discretization-based model with an exact solving approach
    Yang, Ying
    Hao, Xiaodeng
    Wang, Shuaian
    [J]. Transportation Research Part B: Methodological, 2025, 191