Hybrid feature selection based on enhanced genetic algorithm for text categorization

被引:152
|
作者
Ghareb, Abdullah Saeed [1 ]
Abu Bakar, Azuraliza [1 ]
Hamdan, Abdul Razak [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Ctr Artificial Intelligence Technol, Ukm Bangi 43600, Selangor, Malaysia
关键词
Hybrid feature selection; Enhanced genetic algorithm; Filter feature selection; Text categorization; CLASSIFICATION; CATEGORY; METRICS;
D O I
10.1016/j.eswa.2015.12.004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes hybrid feature selection approaches based on the Genetic Algorithm (GA). This approach uses a hybrid search technique that combines the advantages of filter feature selection methods with an enhanced GA (EGA) in a wrapper approach to handle the high dimensionality of the feature space and improve categorization performance simultaneously. First, we propose EGA by improving the crossover and mutation operators. The crossover operation is performed based on chromosome (feature subset) partitioning with term and document frequencies of chromosome entries (features), while the mutation is performed based on the classifier performance of the original parents and feature importance. Thus, the crossover and mutation operations are performed based on useful information instead of using probability and random selection. Second, we incorporate six well-known filter feature selection methods with the EGA to create hybrid feature selection approaches. In the hybrid approach, the EGA is applied to several feature subsets of different sizes, which are ranked in decreasing order based on their importance, and dimension reduction is carried out. The EGA operations are applied to the most important features that had the higher ranks. The effectiveness of the proposed approach is evaluated by using naive Bayes and associative classification on three different collections of Arabic text datasets. The experimental results show the superiority of EGA over GA, comparisons of GA with EGA showed that the latter achieved better results in terms of dimensionality reduction, time and categorization performance. Furthermore, six proposed hybrid FS approaches consisting of a filter method and the EGA are applied to various feature subsets. The results showed that these hybrid approaches are more effective than single filter methods for dimensionality reduction because they were able to produce a higher reduction rate without loss of categorization precision in most situations. (C) 2015 Elsevier Ltd. All rights reserved.
引用
收藏
页码:31 / 47
页数:17
相关论文
共 50 条
  • [31] A WordNet-based approach to feature selection in text categorization
    Zhang, K
    Sun, J
    Wang, B
    INTELLIGENT INFORMATION PROCESSING II, 2005, 163 : 475 - 484
  • [32] Feature Selection Method Based on Crossed Centroid for Text Categorization
    Yang, Jieming
    Liu, Zhiying
    Qu, Zhaoyang
    Wang, Jing
    2014 15TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2014, : 11 - 15
  • [33] CLDA: Feature selection for text categorization based on constrained LDA
    Cui Zifeng
    Xu Baowen
    Zhang Weifeng
    Jiang Dawei
    Xu Junling
    ICSC 2007: INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING, PROCEEDINGS, 2007, : 702 - +
  • [34] A Method of Text Categorization Based on Genetic Algorithm and LDA
    Chen, Lei
    Li, Jun
    Zhang, Li
    PROCEEDINGS OF THE 36TH CHINESE CONTROL CONFERENCE (CCC 2017), 2017, : 10866 - 10870
  • [35] Text categorization algorithm based on feature order pair quantization
    Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
    Qinghua Daxue Xuebao, 2006, 4 (527-529+533):
  • [36] Unsupervised Feature Selection Technique Based on Genetic Algorithm for Improving the Text Clustering
    Abualigah, Laith Mohammad
    Khader, Ahamad Tajudin
    Al-Betar, Mohammed Azmi
    2016 7TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (CSIT), 2016,
  • [37] A New Approach of Feature Selection for Text Categorization
    CUI Zifeng~1
    2. Department of Computer Science and Engineering
    Wuhan University Journal of Natural Sciences, 2006, (05) : 1335 - 1339
  • [38] A Hybrid Approach Based on Genetic Algorithm with Ranking Aggregation for Feature Selection
    Bui Quoc Trung
    Le Minh Duc
    Bui Thi Mai Anh
    ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND PRACTICES IN ARTIFICIAL INTELLIGENCE, 2022, 13343 : 226 - 239
  • [39] Normalized and classified feature selection in text categorization
    Wang, XJ
    Guo, J
    Zheng, KF
    INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES 2005, VOLS 1 AND 2, PROCEEDINGS, 2005, : 173 - 176
  • [40] A new local search based hybrid genetic algorithm for feature selection
    Kabir, Md. Monirul
    Shahjahan, Md.
    Murase, Kazuyuki
    NEUROCOMPUTING, 2011, 74 (17) : 2914 - 2928