Hybrid feature selection for text classification

被引:48
|
作者
Gunal, Serkan [1 ]
机构
[1] Anadolu Univ, Dept Comp Engn, Eskisehir, Turkey
关键词
Feature extraction; feature selection; pattern recognition; text classification; LINEAR DISCRIMINANT-ANALYSIS; ALGORITHMS;
D O I
10.3906/elk-1101-1064
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is vital in the field of pattern classification due to accuracy and processing time considerations. The selection of proper features is of greater importance when the initial feature set is considerably large. Text classification is a typical example of this situation, where the size of the initial feature set may reach to hundreds or even thousands. There are numerous research studies in the literature offering different feature selection strategies for text classification, mostly focused on filters. In spite of the extensive number of these studies, there is no significant work investigating the efficacy of a combination of features, which are selected by different selection methods, under different conditions. In this study, a hybrid feature selection strategy, which consists of both filter and wrapper feature selection steps, is proposed to comprehensively analyze the redundancy or relevancy of the text features selected by different methods in the case of different feature set sizes, dataset characteristics, classifiers, and success measures. The results of the experimental study reveal that a combination of the features selected by various methods is more effective than the features selected by the single selection method. The profile of the combination is, however, influenced by characteristics of the dataset, choice of the classification algorithm, and the success measure.
引用
收藏
页码:1296 / 1311
页数:16
相关论文
共 50 条
  • [41] Feature selection in text classification via SVM and LSI
    Wang, Ziqiang
    Zhang, Dexian
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1381 - 1386
  • [42] Utility-based feature selection for text classification
    Heyong Wang
    Ming Hong
    Raymond Yiu Keung Lau
    [J]. Knowledge and Information Systems, 2019, 61 : 197 - 226
  • [43] Feature Selection for Text Classification Using Mutual Information
    Sel, Ilhami
    Karci, Ali
    Hanbay, Davut
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [44] Impact of Feature Selection and Engineering in the Classification of Handwritten Text
    Kaushik, Anupama
    Gupta, Himanshu
    Latwal, Digvijay Singh
    [J]. PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2598 - 2601
  • [45] Filter feature selection methods for text classification: a review
    Ming, Hong
    Heyong, Wang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (1) : 2053 - 2091
  • [46] Statera: A Balanced Feature Selection Method for Text Classification
    Gama Bispo, Braian Varjao
    Rios, Tatiane Nogueira
    [J]. 2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 260 - 265
  • [47] A Hybrid Attribute Selection Approach for Text Classification
    Chou, Chen-Huei
    Sinha, Atish P.
    Zhao, Huimin
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SYSTEMS, 2010, 11 (09): : 491 - 519
  • [48] A hybrid attribute selection approach for text classification
    Chou, Chen-Huei
    Sinha, Atish P.
    Zhao, Huimin
    [J]. Journal of the Association for Information Systems, 2010, 11 (09) : 491 - 518
  • [49] An Experimental Study of Feature Selection Methods for Text Classification
    Uchyigit, Gulden
    Clark, Keith
    [J]. PERSONALIZATION TECHNIQUES AND RECOMMENDER SYSTEMS, 2008, : 303 - 320
  • [50] Text Learning and Hierarchical Feature Selection in Webpage Classification
    Peng, Xiaogang
    Ming, Zhong
    Wang, Haitao
    [J]. ADVANCED DATA MINING AND APPLICATIONS, PROCEEDINGS, 2008, 5139 : 452 - 459