Effective feature selection technique for text classification

被引:5
|
作者
Seetha, Hari [1 ]
Murty, M. Narasimha [2 ]
Saravanan, R. [3 ]
机构
[1] VIT Univ, Sch Comp Sci & Engn, Vellore 632014, Tamil Nadu, India
[2] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 12, Karnataka, India
[3] VIT Univ, Sch Informat Technol & Engn, Vellore 632014, Tamil Nadu, India
关键词
text classification; SVM classifier; nearest neighbour classifier; feature selection;
D O I
10.1504/IJDMMM.2015.071451
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification plays a vital role in the organisation of the unceasing growth of digital documents. High dimensionality of feature space is a major hassle in text classification. Feature selection, an effective preprocessing technique improves the computational efficiency and the accuracy of a text classifier. In the present paper, text classification is performed with Zipf's law-based feature selection and the use of linear SVM weight for feature ranking. A hybrid feature selection method combining these two feature selection techniques is proposed. Nearest neighbour and SVM classifiers are chosen as text classifiers for their good classification accuracy reported in many text classification tasks. Moreover, to investigate the effect of kernel type on the text classification both linear and non-linear kernels in SVM are examined. The performance is evaluated by determining classification accuracy using ten-fold cross-validation. Experimental results with four benchmark corpuses were encouraging and demonstrated that the classification performance using hybrid feature selection method outperformed the classification performance obtained by selecting either medium frequent features based on Zipf's law or using feature selection by linear SVM.
引用
下载
收藏
页码:165 / 184
页数:20
相关论文
共 50 条
  • [21] A new feature selection method for text classification
    Uchyigit, Gulden
    Clark, Keith
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
  • [22] Text feature selection method for hierarchical classification
    Zhu, Cui-Ling
    Ma, Jun
    Zhang, Dong-Mei
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2011, 24 (01): : 103 - 110
  • [23] A Bayesian feature selection paradigm for text classification
    Feng, Guozhong
    Guo, Jianhua
    Jing, Bing-Yi
    Hao, Lizhu
    INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (02) : 283 - 302
  • [24] Feature Selection Method of Text Tendency Classification
    Li, Yanling
    Dai, Guanzhong
    Li, Gang
    FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 34 - +
  • [25] An enhanced feature selection method for text classification
    Kang, Jinbeom
    Lee, Eunshil
    Hong, Kwanghee
    Park, Jeahyun
    Kim, Taehwan
    Park, Juyoung
    Choi, Joongmin
    Yang, Jaeyoung
    PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
  • [26] A new approach to feature selection in text classification
    Wang, Y
    Wang, XJ
    PROCEEDINGS OF 2005 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-9, 2005, : 3814 - 3819
  • [27] Feature selection improves text classification accuracy
    不详
    IEEE INTELLIGENT SYSTEMS, 2005, 20 (06) : 75 - 75
  • [28] Optimal Feature Selection for Imbalanced Text Classification
    Khurana A.
    Verma O.P.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 135 - 147
  • [29] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [30] Higher order feature selection for text classification
    Bakus, J
    Kamel, MS
    KNOWLEDGE AND INFORMATION SYSTEMS, 2006, 9 (04) : 468 - 491