Hybrid Feature Selection for Amharic News Document Classification

被引:3
|
作者
Endalie, Demeke [1 ]
Haile, Getamesay [1 ]
机构
[1] Jimma Inst Technol, Fac Comp & Informat, Jimma, Ethiopia
关键词
D O I
10.1155/2021/5516262
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Today, the amount of Amharic digital documents has grown rapidly. Because of this, automatic text classification is extremely important. Proper selection of features has a crucial role in the accuracy of classification and computational time. When the initial feature set is considerably larger, it is important to pick the right features. In this paper, we present a hybrid feature selection method, called IGCHIDF, which consists of information gain (IG), chi-square (CHI), and document frequency (DF) features' selection methods. We evaluate the proposed feature selection method on two datasets: dataset 1 containing 9 news categories and dataset 2 containing 13 news categories. Our experimental results showed that the proposed method performs better than other methods on both datasets land 2. The IGCHIDF method's classification accuracy is up to 3.96% higher than the IG method, up to 11.16% higher than CHI, and 7.3% higher than DF on dataset 2, respectively.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Feature selection by integrating document frequency with genetic algorithm for Amharic news document classification
    Endalie, Demeke
    Haile, Getamesay
    Abebe, Wondmagegn Taye
    [J]. PEERJ COMPUTER SCIENCE, 2022, 8
  • [2] Designing a hybrid dimension reduction for improving the performance of Amharic news document classification
    Endalie, Demeke
    Tegegne, Tesfa
    [J]. PLOS ONE, 2021, 16 (05):
  • [3] Investigating Optimal Feature Selection Method to Improve the Performance of Amharic Text Document Classification
    Alemu, Tamir Anteneh
    Tegegnie, Alemu Kumilachew
    [J]. AFRICAN JOURNAL OF LIBRARY ARCHIVES AND INFORMATION SCIENCE, 2019, 29 (02): : 103 - 113
  • [4] Feature selection for document type classification
    Taghva, Kazem
    Vergara, Jason
    [J]. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, 2008, : 179 - 182
  • [5] Feature Selection for Fake News Classification
    Sverdrup-Thygeson, Simen
    Haddow, Pauline C.
    [J]. 2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [6] Feature selection for the classification of large document collections
    Brank, Janez
    Mladenic, Dunja
    Grobelnik, Marko
    Milic-Frayling, Natasa
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2008, 14 (10) : 1562 - 1596
  • [7] The impact of feature selection on medical document classification
    Parlak, Bekir
    Uysal, Alper Kursat
    [J]. 2016 11TH IBERIAN CONFERENCE ON INFORMATION SYSTEMS AND TECHNOLOGIES (CISTI), 2016,
  • [8] Feature selection for document classification based on topology
    El Barbary, O. G.
    Salama, A. S.
    [J]. EGYPTIAN INFORMATICS JOURNAL, 2018, 19 (02) : 129 - 132
  • [9] Discriminative Feature Analysis and Selection for Document Classification
    Chinta, Punya Murthy
    Murty, M. Narasimha
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 366 - 374
  • [10] A Hybrid Algorithm for Feature Selection and Classification
    Sathish, B. R.
    Senthilkumar, Radha
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2023, 24 (03): : 593 - 602