Investigating Optimal Feature Selection Method to Improve the Performance of Amharic Text Document Classification

被引:0
|
作者
Alemu, Tamir Anteneh [1 ]
Tegegnie, Alemu Kumilachew [1 ]
机构
[1] Bahir Dar Univ, Fac Comp, Bahir Dar Inst Technol BiT, Bahir Dar, Ethiopia
关键词
Feature selection; Amharic; SVM; Classification;
D O I
暂无
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Feature selection is one of the famous solutions to reduce high dimensionality problem of text categorisation. In text categorisation, selection of good features (terms) plays a crucial role in improving accuracy, effectiveness and computational efficiency. Due to the nature of the language, Amharic documents suffered from high dimensionality feature space that degrades the performance of the classifier and increases the computational cost. This paper investigates optimal feature selection methods for Amharic Text Document Categorisation among various feature selection techniques such as Term Frequency *Inverse Document Frequency (tf*idf), Information Gain (IG), Mutual Information (MI), Chi-Square (-X-2), and Term Strength (TS) using Support Vector Machine (SVM) classifiers. Experimentations carried out based on the collected datasets showed that X-2 and IG method performed consistently well on Amharic document Texts among other methods. Using both methods, the SVM classifier showed a significant improvement of the classification accuracy and computational efficiency.
引用
收藏
页码:103 / 113
页数:11
相关论文
共 50 条
  • [41] Distance Variance Score: An Efficient Feature Selection Method in Text Classification
    Wang, Heyong
    Hong, Ming
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [42] A feature selection method based on synonym merging in text classification system
    Haipeng Yao
    Chong Liu
    Peiying Zhang
    Luyao Wang
    [J]. EURASIP Journal on Wireless Communications and Networking, 2017
  • [43] A Review on Feature Selection and Feature Extraction for Text Classification
    Shah, Foram P.
    Patel, Vibha
    [J]. PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2016, : 2264 - 2268
  • [44] A feature selection method based on synonym merging in text classification system
    Yao, Haipeng
    Liu, Chong
    Zhang, Peiying
    Wang, Luyao
    [J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2017,
  • [45] A novel multivariate filter method for feature selection in text classification problems
    Labani, Mahdieh
    Moradi, Parham
    Ahmadizar, Fardin
    Jalili, Mahdi
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 70 : 25 - 37
  • [46] A new feature selection method for handling redundant information in text classification
    You-wei Wang
    Li-zhou Feng
    [J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19 : 221 - 234
  • [47] A new feature selection method for handling redundant information in text classification
    Wang, You-wei
    Feng, Li-zhou
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2018, 19 (02) : 221 - 234
  • [48] An improved method of feature selection based on concept attributes in text classification
    Liao, SS
    Jiang, MH
    [J]. ADVANCES IN NATURAL COMPUTATION, PT 1, PROCEEDINGS, 2005, 3610 : 1140 - 1149
  • [49] Study on the Method of Feature Selection Based on Hybrid Model for Text Classification
    Li, Runzhi
    Zhang, Yangsen
    [J]. MATERIALS SCIENCE AND INFORMATION TECHNOLOGY, PTS 1-8, 2012, 433-440 : 2881 - 2886
  • [50] Text Guide: Improving the Quality of Long Text Classification by a Text Selection Method Based on Feature Importance
    Fiok, Krzysztof
    Karwowski, Waldemar
    Gutierrez-Franco, Edgar
    Davahli, Mohammad Reza
    Wilamowski, Maciej
    Ahram, Tareq
    Al-Juaid, Awad
    Zurada, Jozef
    [J]. IEEE ACCESS, 2021, 9 : 105439 - 105450