A New and Efficient Stemming Technique for Arabic Text Categorization

被引:0
|
作者
Hadni, M. [1 ]
Lachkar, A. [2 ]
Alaoui Ouatik, S. [1 ]
机构
[1] USMBA, FSDM, LIM, Fes, Morocco
[2] USMBA, ENSA, LSIS, Fes, Morocco
关键词
Arabic Language; Stemming approaches; Text Categorization;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Text preprocessing of Arabic Language is a challenge and crucial stage in Text Categorization (TC) particularly and Text Mining (TM) generally. Stemming algorithms can be used in Arabic text preprocessing to reduce multiple forms of the word to one form (root or stem). Arabic stemming algorithms can be classified, according to the desired level of analysis, as root-based approach (exp Khoja); stem-based approach (Larkey); and statistical approach (n-garm). Yet no a complete stemmer for this language is available: The existing stemmers not have a high performance. n this paper, in order to improve the accuracy of stemming and therefore the accuracy of our proposed TC system, an efficient hybrid method is proposed for stemming Arabic text. The effectiveness of the aforementioned four methods was evaluated and compared in term of the accuracy of the Naive Bayesian classifier used in our TC system. The proposed stemming algorithm was found to supersede the other stemming ones: The obtained results illustrate that using the proposed stemmer enhances the performance of Arabic Text Categorization: the averages accuracies are: 74.41% for khoja, 59.71% for light stemming, 48.17% for n-grams, and 82.33% for our stemmer
引用
收藏
页码:791 / 796
页数:6
相关论文
共 50 条
  • [1] Stemming Impact on Arabic Text Categorization Performance: a Survey
    Al-Anzi, Fawaz S.
    AbuZeina, Dia
    [J]. 2015 5TH INTERNATIONAL CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGY AND ACCESSIBILITY (ICTA), 2015,
  • [2] Contextual Text Categorization: An Improved Stemming Algorithm to Increase the Quality of Categorization in Arabic Text
    Gadri, Said
    Moussaoui, Abdelouahab
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (06) : 835 - 841
  • [3] Stemming versus light stemming as feature selection techniques for Arabic text categorization
    Duwairi, Rehab
    Al-Refai, Mohammad
    Khasawneh, Natheer
    [J]. 2007 INNOVATIONS IN INFORMATION TECHNOLOGIES, VOLS 1 AND 2, 2007, : 199 - 203
  • [4] Impact of Stemming and Word Embedding on Deep Learning-Based Arabic Text Categorization
    Almuzaini, Huda Abdulrahman
    Azmi, Aqil M.
    [J]. IEEE ACCESS, 2020, 8 : 127913 - 127928
  • [5] Impact of stemming on Arabic text summarization
    Alami, Nabil
    Meknassi, Mohammed
    Ouatik, Said Alaoui
    Ennahnahi, NourEddine
    [J]. 2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 338 - 343
  • [6] Stemming Malay Text and Its Application in Automatic Text Categorization
    Yasukawa, Michiko
    Lim, Hui Tian
    Yokoo, Hidetoshi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (12): : 2351 - 2359
  • [7] Arabic Text Stemming: Comparative Analysis.
    Mamoun, Rasha
    Ahmed, Mahmoud
    [J]. 2016 CONFERENCE OF BASIC SCIENCES AND ENGINEERING STUDIES (SCGAC), 2016, : 88 - 93
  • [8] Arabic Text Categorization Using SVM Active Learning Technique : An Overview
    Goudjil, Mohamed
    Koudil, Mouloud
    Hammami, Nacereddine
    Bedda, Mouldi
    Alruily, Meshrif
    [J]. WORLD CONGRESS ON COMPUTER & INFORMATION TECHNOLOGY (WCCIT 2013), 2013,
  • [9] Machine learning for Arabic text categorization
    Duwairi, Rehab M.
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (08): : 1005 - 1010
  • [10] The Effect of Stemming on Arabic Text Classification: An Empirical Study
    Wahbeh, Abdullah
    Al-Kabi, Mohammed
    Al-Radaideh, Qasem
    Al-Shawakfa, Emad
    Alsmadi, Izzat
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2011, 1 (03) : 54 - 70