A Hybrid Feature Selection Method For Vietnamese Text Classification

被引:5
|
作者
Nguyen Tri Hai [1 ]
Tuan Dinh Le [2 ]
Nguyen Hoang Nghia [1 ]
Vu Thanh Nguyen [1 ]
机构
[1] VNU HCM, Univ Informat Technol, Ho Chi Minh City, Vietnam
[2] Long An Univ Econ & Ind, Tan An, Long An Provinc, Vietnam
关键词
vietnamese text classification; feature felection; OAO multi-class method;
D O I
10.1109/KSE.2015.25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text classification is a very important task due to the huge amount of electronic documents. One of the main challenges for text classification is the high dimensionality of feature spaces. There have been extensive studies on feature selections for English text classification. However, not many works have been studied on Vietnamese text classification. This paper evaluates the performances of the three widely used feature selection methods [2][6][10]: the Chi-square (CHI), the Information Gain (IG), and the Document Frequency (DF). Based on the evaluation, we propose a hybrid feature selection method, called SIGCHI, which combines the Chi-square and the Information Gain feature selection methods. Our experimental results showed that the proposed method performs significantly better than the other methods. The accuracy of SIGCHI method is up to 15.03% higher than the one of CHI method, up to 18.65% higher than the one of IG method, and up to 27.72% higher than the one of DF method, respectively.
引用
收藏
页码:91 / 96
页数:6
相关论文
共 50 条
  • [21] A Hybrid Method for Vietnamese Text Normalization
    Nguyen Thi Thu Trang
    Dang Xuan Bach
    Nguyen Xuan Tung
    [J]. NLPIR 2019: 2019 3RD INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, 2019, : 104 - 109
  • [22] A novel filter feature selection method for text classification: Extensive Feature Selector
    Parlak, Bekir
    Uysal, Alper Kursat
    [J]. JOURNAL OF INFORMATION SCIENCE, 2023, 49 (01) : 59 - 78
  • [23] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    [J]. INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675
  • [24] Contextual feature selection for text classification
    Paradis, Francois
    Nie, Jian-Yun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352
  • [25] Feature selection for text classification: A review
    Deng, Xuelian
    Li, Yuqing
    Weng, Jian
    Zhang, Jilian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3797 - 3816
  • [26] Feature Selection Strategy in Text Classification
    Fung, Pui Cheong Gabriel
    Morstatter, Fred
    Liu, Huan
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 26 - 37
  • [27] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    [J]. Multimedia Tools and Applications, 2019, 78 : 3797 - 3816
  • [28] Feature Selection for Ordinal Text Classification
    Baccianella, Stefano
    Esuli, Andrea
    Sebastiani, Fabrizio
    [J]. NEURAL COMPUTATION, 2014, 26 (03) : 557 - 591
  • [29] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    [J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [30] A New Feature Selection Method for Text Classification Based on Independent Feature Space Search
    Liu, Yong
    Ju, Shenggen
    Wang, Junfeng
    Su, Chong
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020