Hybrid feature selection based on enhanced genetic algorithm for text categorization

被引:152
|
作者
Ghareb, Abdullah Saeed [1 ]
Abu Bakar, Azuraliza [1 ]
Hamdan, Abdul Razak [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Ctr Artificial Intelligence Technol, Ukm Bangi 43600, Selangor, Malaysia
关键词
Hybrid feature selection; Enhanced genetic algorithm; Filter feature selection; Text categorization; CLASSIFICATION; CATEGORY; METRICS;
D O I
10.1016/j.eswa.2015.12.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes hybrid feature selection approaches based on the Genetic Algorithm (GA). This approach uses a hybrid search technique that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously. First, we propose EGA by improving the crossover and mutation operators. The crossover operation is performed based on chromosome (feature subset) partitioning with term and document frequencies of chromosome entries (features), while the mutation is performed based on the classifier performance of the original parents and feature importance. Thus, the crossover and mutation operations are performed based on useful information instead of using probability and random selection. Second, we incorporate six well-known filter feature selection methods with the EGA to create hybrid feature selection approaches. In the hybrid approach, the EGA is applied to several feature subsets of different sizes, which are ranked in decreasing order based on their importance, and dimension reduction is carried out. The EGA operations are applied to the most important features that had the higher ranks. The effectiveness of the proposed approach is evaluated by using naive Bayes and associative classification on three different collections of Arabic text datasets. The experimental results show the superiority of EGA over GA, comparisons of GA with EGA showed that the latter achieved better results in terms of dimensionality reduction, time and categorization performance. Furthermore, six proposed hybrid FS approaches consisting of a filter method and the EGA are applied to various feature subsets. The results showed that these hybrid approaches are more effective than single filter methods for dimensionality reduction because they were able to produce a higher reduction rate without loss of categorization precision in most situations. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:31 / 47
页数:17
相关论文
共 50 条
  • [1] A hybrid feature selection method for text categorization
    Montanes, E.
    Quevedo, J. R.
    Combarro, E. F.
    Diaz, I.
    Ranilla, J.
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2007, 15 (02) : 133 - 151
  • [2] A novel feature selection algorithm for text categorization
    Shang, Wenqian
    Huang, Houkuan
    Zhu, Haibin
    Lin, Yongmin
    Qu, Youli
    Wang, Zhihai
    EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) : 1 - 5
  • [3] A novel approach for text categorization by applying hybrid genetic bat algorithm through feature extraction and feature selection methods
    Eliguzel, Nazmiye
    Cetinkaya, Cihan
    Dereli, Tuerkay
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 202
  • [4] Lazy learner text categorization algorithm based on embedded feature selection
    Yan Peng
    Zheng Xuefeng
    Zhu Jianyong
    Xiao Yunhong
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2009, 20 (03) : 651 - 659
  • [5] Lazy learner text categorization algorithm based on embedded feature selection
    Yan Peng~(1
    2.China State Information Center
    Journal of Systems Engineering and Electronics, 2009, 20 (03) : 651 - 659
  • [6] Research on the algorithm of feature selection based on Gini index for text categorization
    Shang, Wenqian
    Huang, Houkuan
    Liu, Yuling
    Lin, Yongmin
    Qu, Youli
    Dong, Hongbin
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2006, 43 (10): : 1688 - 1694
  • [7] Novel feature selection algorithm for Chinese text categorization based on CHI
    Cai Zhenliang
    Wang Jian
    Liu Jiqiang
    PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 1035 - 1039
  • [8] An Algorithm of Feature Selection in Text Categorization Based on Gini-index
    Zhu, Wei-Dong
    Wang, Bo
    Lin, Yong-Min
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND MANAGEMENT INNOVATION, 2015, 6 : 272 - 278
  • [9] An Improved Strategy of the Feature Selection Algorithm for the Text Categorization
    Yang, Jieming
    Lu, Yixin
    Liu, Zhiying
    2019 20TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2019, : 3 - 7
  • [10] Optimal Feature Selection Algorithm Based on Quantum-Inspired Clone Genetic Strategy in Text Categorization
    Chen, Hao
    Zou, Beiji
    WORLD SUMMIT ON GENETIC AND EVOLUTIONARY COMPUTATION (GEC 09), 2009, : 799 - 802