Itemsets-Based Amharic Document Categorization Using an Extended A Priori Algorithm

被引:0
|
作者
Hailu, Abraham [1 ]
Assabie, Yaregal [1 ]
机构
[1] Univ Addis Ababa, Dept Comp Sci, Addis Ababa, Ethiopia
关键词
Amharic language processing; Text categorization; Document classification; A priori algorithm; Itemsets;
D O I
10.1007/978-3-319-43808-5_24
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document categorization is gaining importance due to the large volume of electronic information which requires automatic organization and pattern identification. Due to the morphological complexity of the language, automatic categorization of Amharic documents has become a difficult talk to carry out. This paper presents a system that categorizes Amharic documents based on the frequency of itemsets obtained after analyzing the morphology of the language. We selected seven categories into which a given document is to be classified. The task of categorization is achieved by employing an extended version of a priori algorithm which had been traditionally used for the purpose of knowledge mining in the form of association rules. The system is tested with a corpus containing Amharic news documents and experimental results are reported.
引用
收藏
页码:317 / 326
页数:10
相关论文
共 50 条
  • [1] Medical document categorization using a Priori Knowledge
    Itert, L
    Duch, W
    Pestian, J
    [J]. ARTIFICIAL NEURAL NETWORKS: BIOLOGICAL INSPIRATIONS - ICANN 2005, PT 1, PROCEEDINGS, 2005, 3696 : 641 - 646
  • [2] Hypergraph based document categorization: frequent itemsets vs hypercliques
    Hu, Tian-Ming
    Ouyang, Ji
    Qu, Chao
    Sung, Sam Yuan
    [J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 824 - +
  • [3] Document Categorization Algorithm Based on Kernel NPE
    Wang, Ziqiang
    Sun, Xia
    Zhang, Qingzhou
    [J]. CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 2958 - 2961
  • [4] Web Document Categorization Algorithm Using LDE and MA
    Sun, Xia
    Wang, Ziqiang
    [J]. PROCEEDINGS OF THE 2009 SECOND PACIFIC-ASIA CONFERENCE ON WEB MINING AND WEB-BASED APPLICATION, 2009, : 197 - 200
  • [5] An Efficient Document Categorization Algorithm Based on LDA and SFL
    Sun, Xia
    Wang, Ziqiang
    [J]. ISBIM: 2008 INTERNATIONAL SEMINAR ON BUSINESS AND INFORMATION MANAGEMENT, VOL 2, 2009, : 112 - 115
  • [6] Amharic character recognition using a fast signature based algorithm
    Cowell, J
    Hussain, F
    [J]. SEVENTH INTERNATIONAL CONFERENCE ON INFORMATION VISUALIZATION, PROCEEDINGS, 2003, : 384 - 389
  • [7] ELSA: A Multilingual Document Summarization Algorithm Based on Frequent Itemsets and Latent Semantic Analysis
    Cagliero, Luca
    Garza, Paolo
    Baralis, Elena
    [J]. ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2019, 37 (02)
  • [8] Document Categorization using Multilingual Associative Networks based on Wikipedia
    Bloom, Niels
    Theune, Mariet
    De Jong, Franciska
    [J]. WWW'15 COMPANION: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2015, : 841 - 846
  • [9] An Improved Transfer Learning Algorithm for Document Categorization Based on Data Sets Reconstruction
    Sun, Wei
    Qian, Xu
    [J]. PROCEEDINGS OF THE 10TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2012), 2012, : 575 - 578
  • [10] EMAIL BASED CATEGORIZATION ON SURNAME USING SUFFIXTREE ALGORITHM
    Suresh, R.
    Abirami, P.
    ArchanakumariSharma
    Kavitha, T.
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTATION OF POWER, ENERGY INFORMATION AND COMMUNICATION (ICCPEIC), 2017, : 374 - 378