Effective feature selection technique for text classification

被引:5
|
作者
Seetha, Hari [1 ]
Murty, M. Narasimha [2 ]
Saravanan, R. [3 ]
机构
[1] VIT Univ, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
[2] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 12, Karnataka, India
[3] VIT Univ, Sch Informat Technol & Engn, Vellore 632014, Tamil Nadu, India
关键词
text classification; SVM classifier; nearest neighbour classifier; feature selection;
D O I
10.1504/IJDMMM.2015.071451
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification plays a vital role in the organisation of the unceasing growth of digital documents. High dimensionality of feature space is a major hassle in text classification. Feature selection, an effective preprocessing technique improves the computational efficiency and the accuracy of a text classifier. In the present paper, text classification is performed with Zipf's law-based feature selection and the use of linear SVM weight for feature ranking. A hybrid feature selection method combining these two feature selection techniques is proposed. Nearest neighbour and SVM classifiers are chosen as text classifiers for their good classification accuracy reported in many text classification tasks. Moreover, to investigate the effect of kernel type on the text classification both linear and non-linear kernels in SVM are examined. The performance is evaluated by determining classification accuracy using ten-fold cross-validation. Experimental results with four benchmark corpuses were encouraging and demonstrated that the classification performance using hybrid feature selection method outperformed the classification performance obtained by selecting either medium frequent features based on Zipf's law or using feature selection by linear SVM.
引用
收藏
页码:165 / 184
页数:20
相关论文
共 50 条
  • [31] Higher order feature selection for text classification
    Jan Bakus
    Mohamed S. Kamel
    Knowledge and Information Systems, 2006, 9 : 468 - 491
  • [32] Composite Feature Extraction and Selection for Text Classification
    Wan, Chuan
    Wang, Yuling
    Liu, Yaoze
    Ji, Jinchao
    Feng, Guozhong
    IEEE ACCESS, 2019, 7 : 35208 - 35219
  • [33] Diagnosis of Chronic Kidney Disease Using Effective Classification and Feature Selection Technique
    Tazin, Nusrat
    Sabab, Shahed Anzarus
    Chowdhury, Muhammed Tawfiq
    2016 INTERNATIONAL CONFERENCE ON MEDICAL ENGINEERING, HEALTH INFORMATICS AND TECHNOLOGY (MEDITEC), 2016,
  • [34] Predicting Protein in Cancer Diagnosis using Effective Classification and Feature Selection Technique
    Lobo, Sophia
    Pallavi, M. S.
    PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2018, : 156 - 159
  • [35] Predicting Breast Cancer Recurrence using effective Classification and Feature Selection technique
    Pritom, Ahmed Iqbal
    Munshi, Md. Ahadur Rahman
    Sabab, Shahed Anzarus
    Shihab, Shihabuzzaman
    PROCEEDINGS OF THE 2016 19TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2016, : 310 - 314
  • [36] A feature selection algorithm with redundancy reduction for text classification
    Saleh, Sherine Nagi
    El-Sonbaty, Yasser
    2007 22ND INTERNATIONAL SYMPOSIUM ON COMPUTER AND INFORMATION SCIENCES, 2007, : 130 - +
  • [37] Two new feature selection metrics for text classification
    Sahin, Durmus Ozkan
    Kilic, Erdal
    AUTOMATIKA, 2019, 60 (02) : 162 - 171
  • [38] An application of MOGW optimization for feature selection in text classification
    Asgarnezhad, Razieh
    Monadjemi, S. Amirhassan
    Soltanaghaei, Mohammadreza
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (06): : 5806 - 5839
  • [39] Effective Feature Selection for Classification of Promoter Sequences
    Kouser, K.
    Lavanya, P. G.
    Rangarajan, Lalitha
    Kshitish, Acharya K.
    PLOS ONE, 2016, 11 (12):
  • [40] Feature selection in text classification via SVM and LSI
    Wang, Ziqiang
    Zhang, Dexian
    ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 1381 - 1386