Investigating Optimal Feature Selection Method to Improve the Performance of Amharic Text Document Classification

被引:0
|
作者
Alemu, Tamir Anteneh [1 ]
Tegegnie, Alemu Kumilachew [1 ]
机构
[1] Bahir Dar Univ, Fac Comp, Bahir Dar Inst Technol BiT, Bahir Dar, Ethiopia
关键词
Feature selection; Amharic; SVM; Classification;
D O I
暂无
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Feature selection is one of the famous solutions to reduce high dimensionality problem of text categorisation. In text categorisation, selection of good features (terms) plays a crucial role in improving accuracy, effectiveness and computational efficiency. Due to the nature of the language, Amharic documents suffered from high dimensionality feature space that degrades the performance of the classifier and increases the computational cost. This paper investigates optimal feature selection methods for Amharic Text Document Categorisation among various feature selection techniques such as Term Frequency *Inverse Document Frequency (tf*idf), Information Gain (IG), Mutual Information (MI), Chi-Square (-X-2), and Term Strength (TS) using Support Vector Machine (SVM) classifiers. Experimentations carried out based on the collected datasets showed that X-2 and IG method performed consistently well on Amharic document Texts among other methods. Using both methods, the SVM classifier showed a significant improvement of the classification accuracy and computational efficiency.
引用
收藏
页码:103 / 113
页数:11
相关论文
共 50 条
  • [31] Feature selection for document type classification
    Taghva, Kazem
    Vergara, Jason
    [J]. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 179 - 182
  • [32] Scaling feature selection method for enhancing the classification performance of Support Vector Machines in text mining
    Manochandar, S.
    Punniyamoorthy, M.
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2018, 124 : 139 - 156
  • [33] Dynamic feature selection in text classification
    Doan, Son
    Horiguchi, Susumu
    [J]. INTELLIGENT CONTROL AND AUTOMATION, 2006, 344 : 664 - 675
  • [34] Feature selection for text classification: A review
    Deng, Xuelian
    Li, Yuqing
    Weng, Jian
    Zhang, Jilian
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3797 - 3816
  • [35] Hybrid feature selection for text classification
    Gunal, Serkan
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2012, 20 : 1296 - 1311
  • [36] Contextual feature selection for text classification
    Paradis, Francois
    Nie, Jian-Yun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (02) : 344 - 352
  • [37] An optimal approach for text feature selection
    El-Hajj, Wassim
    Hajj, Hazem
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 74
  • [38] Feature Selection Strategy in Text Classification
    Fung, Pui Cheong Gabriel
    Morstatter, Fred
    Liu, Huan
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 26 - 37
  • [39] Feature Selection Methods for Text Classification
    Dasgupta, Anirban
    Drineas, Petros
    Harb, Boulos
    Josifovski, Vanja
    Mahoney, Michael W.
    [J]. KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 230 - +
  • [40] Feature selection for text classification: A review
    Xuelian Deng
    Yuqing Li
    Jian Weng
    Jilian Zhang
    [J]. Multimedia Tools and Applications, 2019, 78 : 3797 - 3816