A novel robust Arabic light stemmer

被引:30
|
作者
Abainia, Kheireddine [1 ]
Ouamour, Siham [1 ]
Sayoud, Halim [1 ]
机构
[1] USTHB Univ, FEI, Bab Ezzouar, Algeria
关键词
Arabic stemming; light stemming; Arabic morphology; prefixes; suffixes; infixes; topic identification; information retrieval;
D O I
10.1080/0952813X.2016.1212100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The stemming is the process of transforming a word into its root or stem, hence, it is considered as a crucial pre-processing step before tackling any task of natural language processing or information retrieval. However, in the case of Arabic language, finding an effective stemming algorithm seems to be quite difficult, since the Arabic language has a specific morphology, which is different from many other languages. Although, there exist several algorithms in literature addressing the Arabic stemming issue, unfortunately, most of them are restricted to a limited number of words, present some confusions between original letters and affixes, and usually employ dictionary of words or patterns. For that purpose, we propose the design and implementation of a novel Arabic light stemmer, which is based on some new rules for stripping prefixes, suffixes and infixes in a smart way. And in our knowledge, it is the first work dealing with Arabic infixes with regards to their irregular rules. The empirical evaluation was conducted on a new Arabic data-set (called ARASTEM), which was conceived and collected from several Arabic discussion forums containing dialectical Arabic and modern pseudo-Arabic languages. Hence, we present a comparative investigation between our new stemmer and other existing stemmers using Paice's parameters, namely: Under Stemming Index (UI), Over Stemming Index (OI) and Stemming Weight (SW). Results show that the proposed Arabic light stemmer maintains consistently high performances and outperforms several existing light stemmers.
引用
下载
收藏
页码:557 / 573
页数:17
相关论文
共 50 条
  • [21] An Application of Pattern Matching Stemmer in Arabic Dialogue System
    Hijjawi, Mohammad
    Bandar, Zuhair
    Crockett, Keeley
    Mclean, David
    AGENT AND MULTI-AGENT SYSTEMS: TECHNOLOGIES AND APPLICATIONS, 2011, 6682 : 35 - 43
  • [22] An Arabic Lemma-Based Stemmer for Latent Topic Modeling
    Brahmi, Abderrezak
    Ech-Cherif, Ahmed
    Benyettou, Abdelkader
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2013, 10 (02) : 160 - 168
  • [23] Towards Improving Khoja Rule-Based Arabic Stemmer
    Al-Kabi, Mohammed N.
    2013 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2013,
  • [24] Improving Arabic Text Classification Using P-Stemmer
    Kanan T.
    Hawashin B.
    Alzubi S.
    Almaita E.
    Alkhatib A.
    Maria K.A.
    Elbes M.
    Recent Advances in Computer Science and Communications, 2022, 15 (03) : 404 - 411
  • [25] Building an automatic stemmer to enhance arabic information retrieval systems
    Alsamara, K
    Abuleil, S
    Abu-Salem, H
    Hammo, B
    International Conference on Computing, Communications and Control Technologies, Vol 5, Proceedings, 2004, : 270 - 274
  • [26] A Study of Graph Based Stemmer in Arabic Extrinsic Plagiarism Detection
    Boukhalfa, Imene
    Mostefai, Sihem
    Chekkai, Nacira
    PROCEEDINGS OF THE 2ND MEDITERRANEAN CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (MEDPRAI-2018), 2018, : 27 - 32
  • [27] A Rule-Based Subject-Correlated Arabic Stemmer
    El-Defrawy, Mahmoud
    El-Sonbaty, Yasser
    Belal, Nahla A.
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2016, 41 (08) : 2883 - 2891
  • [28] A Rule-Based Subject-Correlated Arabic Stemmer
    Mahmoud El-Defrawy
    Yasser El-Sonbaty
    Nahla A. Belal
    Arabian Journal for Science and Engineering, 2016, 41 : 2883 - 2891
  • [29] Stemmer algorithm for arabic words based on excessive letter locations
    Al-Shalabi, Riyad
    Kanaan, Ghassan
    Ghwanmeh, Sameh
    Nour, Fuad Mousa
    2007 INNOVATIONS IN INFORMATION TECHNOLOGIES, VOLS 1 AND 2, 2007, : 402 - +
  • [30] An intelligent use of stemmer and morphology analysis for Arabic information retrieval
    Alnaied, Ali
    Elbendak, Mosa
    Bulbul, Abdullah
    EGYPTIAN INFORMATICS JOURNAL, 2020, 21 (04) : 209 - 217