Hybrid fast unsupervised feature selection for high-dimensional data

被引:55
|
作者
Manbari, Zhaleh [1 ]
AkhlaghianTab, Fardin [1 ]
Salavati, Chiman [1 ]
机构
[1] Univ Kurdistan, Dept Comp Engn, Sanandaj, Iran
关键词
Feature selection; High-dimensional data; Binary ant system; Clustering; Mutation; ANT COLONY OPTIMIZATION; FEATURE SUBSET-SELECTION; REDUNDANCY FEATURE-SELECTION; SUPERVISED FEATURE-SELECTION; EFFICIENT FEATURE-SELECTION; MUTUAL INFORMATION; GENETIC ALGORITHM; BOUND ALGORITHM; CLASSIFICATION; RELEVANCE;
D O I
10.1016/j.eswa.2019.01.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The emergence of "curse of dimensionality" issue as a result of high reduces datasets deteriorates the capability of learning algorithms, and also requires high memory and computational costs. Selection of features by discarding redundant and irrelevant features functions as a crucial machine learning technique aimed at reducing the dimensionality of these datasets, which improves the performance of the learning algorithm. Feature selection has been extensively applied in many application areas relevant to expert and intelligent systems, such as data mining and machine learning. Although many algorithms have been developed so far, they are still unsatisfying confronting high-dimensional data. This paper presented a new hybrid filter-based feature selection algorithm based on acombination of clustering and the modified Binary Ant System (BAS), called FSCBAS, to overcome the search space and high-dimensional data processing challenges efficiently. This model provided both global and local search capabilities between and within clusters. In the proposed method, inspired by genetic algorithm and simulated annealing, a damped mutation strategy was introduced that avoided falling into local optima, and a new redundancy reduction policy adopted to estimate the correlation between the selected features further improved the algorithm. The proposed method can be applied in many expert system applications such as microarray data processing, text classification and image processing in high-dimensional data to handle the high dimensionality of the feature space and improve classification performance simultaneously. The performance of the proposed algorithm was compared to that of state-of-the-art feature selection algorithms using different classifiers on real-world datasets. The experimental results confirmed that the proposed method reduced computational complexity significantly, and achieved better performance than the other feature selection methods. (C) 2019 Elsevier Ltd. All rights reserved.
引用
收藏
页码:97 / 118
页数:22
相关论文
共 50 条
  • [1] Hybrid Feature Selection for High-Dimensional Manufacturing Data
    Sun, Yajuan
    Yu, Jianlin
    Li, Xiang
    Wu, Ji Yan
    Lu, Wen Feng
    [J]. 2021 26TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2021,
  • [2] A hybrid feature selection method for high-dimensional data
    Taheri, Nooshin
    Nezamabadi-pour, Hossein
    [J]. 2014 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2014, : 141 - 145
  • [3] A hybrid feature selection scheme for high-dimensional data
    Ganjei, Mohammad Ahmadi
    Boostani, Reza
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 113
  • [4] Single Sequence Fast Feature Selection for High-Dimensional Data
    Boldt, Francisco de Assis
    Rauber, Thomas W.
    Varejao, Flavio M.
    [J]. 2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 697 - 704
  • [5] An information-theoretic approach to unsupervised feature selection for high-dimensional data
    Huang, Shao-Lun
    Xu, Xiangxiang
    Zheng, Lizhong
    [J]. IEEE Journal on Selected Areas in Information Theory, 2020, 1 (01): : 157 - 166
  • [6] An Information-theoretic Approach to Unsupervised Feature Selection for High-Dimensional Data
    Huang, Shao-Lun
    Zhang, Lin
    Zheng, Lizhong
    [J]. 2017 IEEE INFORMATION THEORY WORKSHOP (ITW), 2017, : 434 - 438
  • [7] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    [J]. Computational Management Science, 2009, 6 (1) : 25 - 40
  • [8] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    [J]. Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [9] Accurate and fast feature selection workflow for high-dimensional omics data
    Perez-Riverol, Yasset
    Kuhn, Max
    Vizcaino, Juan Antonio
    Hitz, Marc-Phillip
    Audain, Enrique
    [J]. PLOS ONE, 2017, 12 (12):
  • [10] Unsupervised Hybrid Feature Extraction Selection for High-Dimensional Non-Gaussian Data Clustering with Variational Inference
    Fan, Wentao
    Bouguila, Nizar
    Ziou, Djemel
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (07) : 1670 - 1685