Investigating Optimal Feature Selection Method to Improve the Performance of Amharic Text Document Classification

被引:0
|
作者
Alemu, Tamir Anteneh [1 ]
Tegegnie, Alemu Kumilachew [1 ]
机构
[1] Bahir Dar Univ, Fac Comp, Bahir Dar Inst Technol BiT, Bahir Dar, Ethiopia
关键词
Feature selection; Amharic; SVM; Classification;
D O I
暂无
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
Feature selection is one of the famous solutions to reduce high dimensionality problem of text categorisation. In text categorisation, selection of good features (terms) plays a crucial role in improving accuracy, effectiveness and computational efficiency. Due to the nature of the language, Amharic documents suffered from high dimensionality feature space that degrades the performance of the classifier and increases the computational cost. This paper investigates optimal feature selection methods for Amharic Text Document Categorisation among various feature selection techniques such as Term Frequency *Inverse Document Frequency (tf*idf), Information Gain (IG), Mutual Information (MI), Chi-Square (-X-2), and Term Strength (TS) using Support Vector Machine (SVM) classifiers. Experimentations carried out based on the collected datasets showed that X-2 and IG method performed consistently well on Amharic document Texts among other methods. Using both methods, the SVM classifier showed a significant improvement of the classification accuracy and computational efficiency.
引用
收藏
页码:103 / 113
页数:11
相关论文
共 50 条
  • [1] Hybrid Feature Selection for Amharic News Document Classification
    Endalie, Demeke
    Haile, Getamesay
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [2] Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification
    Endalie, Demeke
    Haile, Getamesay
    Abebe, Wondmagegn Taye
    [J]. PEERJ COMPUTER SCIENCE, 2022, 8
  • [3] Weighted Document Frequency for Feature Selection in Text Classification
    Li, Baoli
    Yan, Qiuling
    Xu, Zhenqiang
    Wang, Guicai
    [J]. PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING, 2015, : 132 - 135
  • [4] Optimal Feature Selection for Imbalanced Text Classification
    Khurana, Anshu
    Verma, Om Prakash
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 135 - 147
  • [5] OPTIMAL FEATURE SUBSET SELECTION BASED ON COMBINING DOCUMENT FREQUENCY AND TERM FREQUENCY FOR TEXT CLASSIFICATION
    Karpagalingam, Thirumoorthy
    Karuppaiah, Muneeswaran
    [J]. COMPUTING AND INFORMATICS, 2020, 39 (05) : 881 - 906
  • [6] Optimal feature subset selection based on combining document frequency and term frequency for text classification
    Karpagalingam, Thirumoorthy
    Karuppaiah, Muneeswaran
    [J]. Computing and Informatics, 2021, 39 (05) : 881 - 906
  • [7] A CLASS SPECIFIC FEATURE SELECTION METHOD FOR IMPROVING THE PERFORMANCE OF TEXT CLASSIFICATION
    Venkatesh, V.
    Sharan, S. B.
    Mahalaxmy, S.
    Monisha, S.
    Sanjey, Ashick D. S.
    Ashokkumar, P.
    [J]. SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2024, 25 (02): : 1018 - 1028
  • [8] The Influence of Feature Representation of Text on the Performance of Document Classification
    Martincic-Ipsic, Sanda
    Milicic, Tanja
    Todorovski, Ljupco
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (04):
  • [9] Improved Document Feature Selection with Categorical Parameter for Text Classification
    Wang, Fen
    Li, Xiaoxuan
    Huang, Xiaotao
    Kang, Ling
    [J]. MOBILE, SECURE, AND PROGRAMMABLE NETWORKING (MSPN 2016), 2016, 10026 : 86 - 98
  • [10] Efficient Method for Feature Selection in Text Classification
    Sun, Jian
    Zhang, Xiang
    Liao, Dan
    Chang, Victor
    [J]. 2017 INTERNATIONAL CONFERENCE ON ENGINEERING AND TECHNOLOGY (ICET), 2017,