A Hybrid Feature Selection Method For Vietnamese Text Classification

被引:5
|
作者
Nguyen Tri Hai [1 ]
Tuan Dinh Le [2 ]
Nguyen Hoang Nghia [1 ]
Vu Thanh Nguyen [1 ]
机构
[1] VNU HCM, Univ Informat Technol, Ho Chi Minh City, Vietnam
[2] Long An Univ Econ & Ind, Tan An, Long An Provinc, Vietnam
关键词
vietnamese text classification; feature felection; OAO multi-class method;
D O I
10.1109/KSE.2015.25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is a very important task due to the huge amount of electronic documents. One of the main challenges for text classification is the high dimensionality of feature spaces. There have been extensive studies on feature selections for English text classification. However, not many works have been studied on Vietnamese text classification. This paper evaluates the performances of the three widely used feature selection methods [2][6][10]: the Chi-square (CHI), the Information Gain (IG), and the Document Frequency (DF). Based on the evaluation, we propose a hybrid feature selection method, called SIGCHI, which combines the Chi-square and the Information Gain feature selection methods. Our experimental results showed that the proposed method performs significantly better than the other methods. The accuracy of SIGCHI method is up to 15.03% higher than the one of CHI method, up to 18.65% higher than the one of IG method, and up to 27.72% higher than the one of DF method, respectively.
引用
收藏
页码:91 / 96
页数:6
相关论文
共 50 条
  • [1] Hybrid feature selection for text classification
    Gunal, Serkan
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
  • [2] A hybrid method of feature selection for Chinese text sentiment classification
    Wang, Suge
    Wei, Yingjie
    Li, Deyu
    Zhang, Wu
    Li, Wei
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 435 - +
  • [3] Study on the Method of Feature Selection Based on Hybrid Model for Text Classification
    Li, Runzhi
    Zhang, Yangsen
    [J]. MATERIALS SCIENCE AND INFORMATION TECHNOLOGY, PTS 1-8, 2012, 433-440 : 2881 - 2886
  • [4] Hybrid Support Vector Machine based Feature Selection Method for Text Classification
    Sabbah, Thabit
    Ayyash, Mosab
    Ashraf, Mahmood
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (3A) : 599 - 609
  • [5] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    [J]. 2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,
  • [6] A new feature selection method for text classification
    Uchyigit, Gulden
    Clark, Keith
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2007, 21 (02) : 423 - 438
  • [7] Feature Selection Method of Text Tendency Classification
    Li, Yanling
    Dai, Guanzhong
    Li, Gang
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 34 - +
  • [8] An enhanced feature selection method for text classification
    Kang, Jinbeom
    Lee, Eunshil
    Hong, Kwanghee
    Park, Jeahyun
    Kim, Taehwan
    Park, Juyoung
    Choi, Joongmin
    Yang, Jaeyoung
    [J]. PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 36 - 41
  • [9] A hybrid feature selection method for text categorization
    Montanes, E.
    Quevedo, J. R.
    Combarro, E. F.
    Diaz, I.
    Ranilla, J.
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2007, 15 (02) : 133 - 151
  • [10] A parallel feature selection method study for text classification
    Li, Zhao
    Lu, Wei
    Sun, Zhanquan
    Xing, Weiwei
    [J]. NEURAL COMPUTING & APPLICATIONS, 2017, 28 : S513 - S524