Effective feature selection technique for text classification

被引:5
|
作者
Seetha, Hari [1 ]
Murty, M. Narasimha [2 ]
Saravanan, R. [3 ]
机构
[1] VIT Univ, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
[2] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 12, Karnataka, India
[3] VIT Univ, Sch Informat Technol & Engn, Vellore 632014, Tamil Nadu, India
关键词
text classification; SVM classifier; nearest neighbour classifier; feature selection;
D O I
10.1504/IJDMMM.2015.071451
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification plays a vital role in the organisation of the unceasing growth of digital documents. High dimensionality of feature space is a major hassle in text classification. Feature selection, an effective preprocessing technique improves the computational efficiency and the accuracy of a text classifier. In the present paper, text classification is performed with Zipf's law-based feature selection and the use of linear SVM weight for feature ranking. A hybrid feature selection method combining these two feature selection techniques is proposed. Nearest neighbour and SVM classifiers are chosen as text classifiers for their good classification accuracy reported in many text classification tasks. Moreover, to investigate the effect of kernel type on the text classification both linear and non-linear kernels in SVM are examined. The performance is evaluated by determining classification accuracy using ten-fold cross-validation. Experimental results with four benchmark corpuses were encouraging and demonstrated that the classification performance using hybrid feature selection method outperformed the classification performance obtained by selecting either medium frequent features based on Zipf's law or using feature selection by linear SVM.
引用
收藏
页码:165 / 184
页数:20
相关论文
共 50 条
  • [41] A New Filter Feature Selection Method for Text Classification
    Cekik, Rasim
    IEEE Access, 2024, 12 : 139316 - 139335
  • [42] A Comparative Study on Feature Selection in Unbalance Text Classification
    Xu, Yan
    2012 INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING (ISISE), 2012, : 44 - 47
  • [43] Feature Selection For Text Classification Using Genetic Algorithms
    Bidi, Noria
    Elberrichi, Zakaria
    PROCEEDINGS OF 2016 8TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION & CONTROL (ICMIC 2016), 2016, : 806 - 810
  • [44] Utility-based feature selection for text classification
    Heyong Wang
    Ming Hong
    Raymond Yiu Keung Lau
    Knowledge and Information Systems, 2019, 61 : 197 - 226
  • [45] Utility-based feature selection for text classification
    Wang, Heyong
    Hong, Ming
    Lau, Raymond Yiu Keung
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 197 - 226
  • [46] Feature Selection by Using Heuristic Methods for Text Classification
    Sel, Ilhami
    Yeroglu, Celalettin
    Hanbay, Davut
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [47] Filter feature selection methods for text classification: a review
    Ming, Hong
    Heyong, Wang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (1) : 2053 - 2091
  • [48] Statera: A Balanced Feature Selection Method for Text Classification
    Gama Bispo, Braian Varjao
    Rios, Tatiane Nogueira
    2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 260 - 265
  • [49] Feature Selection for Text Classification Using Mutual Information
    Sel, Ilhami
    Karci, Ali
    Hanbay, Davut
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [50] Impact of Feature Selection and Engineering in the Classification of Handwritten Text
    Kaushik, Anupama
    Gupta, Himanshu
    Latwal, Digvijay Singh
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 2598 - 2601