An Efficient, Parallelized Algorithm for Optimal Conditional Entropy-Based Feature Selection

被引:5
|
作者
Estrela, Gustavo [1 ,2 ]
Gubitoso, Marco Dimas [2 ]
Ferreira, Carlos Eduardo [2 ]
Barrera, Junior [2 ]
Reis, Marcelo S. [1 ]
机构
[1] Inst Butantan, Ctr Toxins Immune Response & Cell Signaling CeT, Lab Ciclo Celular, BR-05503900 Butanta, SP, Brazil
[2] Univ Sao Paulo, Inst Matemat & Estat, BR-05503900 Sao Paulo, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
machine learning; supervised learning; information theory; mean conditional entropy; feature selection; classifier design; Support-Vector Machine; U-curve problem; Boolean lattice; MUTUAL INFORMATION; DESIGN;
D O I
10.3390/e22040492
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
In Machine Learning, feature selection is an important step in classifier design. It consists of finding a subset of features that is optimum for a given cost function. One possibility to solve feature selection is to organize all possible feature subsets into a Boolean lattice and to exploit the fact that the costs of chains in that lattice describe U-shaped curves. Minimization of such cost function is known as the U-curve problem. Recently, a study proposed U-Curve Search (UCS), an optimal algorithm for that problem, which was successfully used for feature selection. However, despite of the algorithm optimality, the UCS required time in computational assays was exponential on the number of features. Here, we report that such scalability issue arises due to the fact that the U-curve problem is NP-hard. In the sequence, we introduce the Parallel U-Curve Search (PUCS), a new algorithm for the U-curve problem. In PUCS, we present a novel way to partition the search space into smaller Boolean lattices, thus rendering the algorithm highly parallelizable. We also provide computational assays with both synthetic data and Machine Learning datasets, where the PUCS performance was assessed against UCS and other golden standard algorithms in feature selection.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] Genetic Algorithm for Entropy-based Feature Subset Selection
    Kromer, Pavel
    Platos, Jan
    [J]. 2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 4486 - 4493
  • [2] Conditional entropy-based feature selection for fault detection in analog circuits
    Long, Ting
    Jiang, Shiqi
    Luo, Hang
    Deng, Changjian
    [J]. DYNA, 2016, 91 (03): : 309 - 318
  • [3] Efficient Multi-Label Feature Selection Using Entropy-Based Label Selection
    Lee, Jaesung
    Kim, Dae-Won
    [J]. ENTROPY, 2016, 18 (11)
  • [4] Multiscale Fuzzy Entropy-Based Feature Selection
    Wang, Zhihong
    Chen, Hongmei
    Yuan, Zhong
    Wan, Jihong
    Li, Tianrui
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2023, 31 (09) : 3248 - 3262
  • [5] A Novel Entropy-Based Approach to Feature Selection
    Tu, Chia-Hao
    Li, Chunshien
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2017, PT I, 2017, 10191 : 445 - 454
  • [6] KERNEL ENTROPY-BASED UNSUPERVISED SPECTRAL FEATURE SELECTION
    Zhang, Zhihong
    Hancock, Edwin R.
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2012, 26 (05)
  • [7] A relative decision entropy-based feature selection approach
    Jiang, Feng
    Sui, Yuefei
    Zhou, Lin
    [J]. PATTERN RECOGNITION, 2015, 48 (07) : 2151 - 2163
  • [8] Rough entropy-based feature selection and its application
    Sun, Lin
    Xu, Jiucheng
    Xue, Zhan'ao
    Zhang, Lingjun
    [J]. Journal of Information and Computational Science, 2011, 8 (09): : 1525 - 1532
  • [9] Entropy-Based Feature Selection for Network Anomaly Detection
    Alabi, Ruth
    Yurtkan, Kamil
    [J]. 2018 2ND INTERNATIONAL SYMPOSIUM ON MULTIDISCIPLINARY STUDIES AND INNOVATIVE TECHNOLOGIES (ISMSIT), 2018, : 563 - 569
  • [10] A fast algorithm for feature selection in conditional maximum entropy modeling
    Zhou, YQ
    Weng, FL
    Wu, L
    Schmidt, H
    [J]. PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2003, : 153 - 159