A novel robust Arabic light stemmer

被引:30
|
作者
Abainia, Kheireddine [1 ]
Ouamour, Siham [1 ]
Sayoud, Halim [1 ]
机构
[1] USTHB Univ, FEI, Bab Ezzouar, Algeria
关键词
Arabic stemming; light stemming; Arabic morphology; prefixes; suffixes; infixes; topic identification; information retrieval;
D O I
10.1080/0952813X.2016.1212100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The stemming is the process of transforming a word into its root or stem, hence, it is considered as a crucial pre-processing step before tackling any task of natural language processing or information retrieval. However, in the case of Arabic language, finding an effective stemming algorithm seems to be quite difficult, since the Arabic language has a specific morphology, which is different from many other languages. Although, there exist several algorithms in literature addressing the Arabic stemming issue, unfortunately, most of them are restricted to a limited number of words, present some confusions between original letters and affixes, and usually employ dictionary of words or patterns. For that purpose, we propose the design and implementation of a novel Arabic light stemmer, which is based on some new rules for stripping prefixes, suffixes and infixes in a smart way. And in our knowledge, it is the first work dealing with Arabic infixes with regards to their irregular rules. The empirical evaluation was conducted on a new Arabic data-set (called ARASTEM), which was conceived and collected from several Arabic discussion forums containing dialectical Arabic and modern pseudo-Arabic languages. Hence, we present a comparative investigation between our new stemmer and other existing stemmers using Paice's parameters, namely: Under Stemming Index (UI), Over Stemming Index (OI) and Stemming Weight (SW). Results show that the proposed Arabic light stemmer maintains consistently high performances and outperforms several existing light stemmers.
引用
收藏
页码:557 / 573
页数:17
相关论文
共 50 条
  • [1] ARABIC LIGHT STEMMER (ARS)
    Al-Omari, Asma
    Abuata, Belal
    [J]. JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2014, 9 (06): : 702 - 716
  • [2] An Improved Arabic Light Stemmer
    Elrajubi, Osama Mohamed
    [J]. 2013 INTERNATIONAL CONFERENCE ON RESEARCH AND INNOVATION IN INFORMATION SYSTEMS (ICRIIS), 2013, : 33 - 38
  • [3] Conditional Arabic Light Stemmer: CondLight
    Al-Lahham, Yaser
    Matarneh, Khawlah
    Hassan, Mohammad
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (3A) : 559 - 564
  • [4] A novel root based Arabic stemmer
    Al-Kabi, Mohammed N.
    Kazakzeh, Saif A.
    Abu Ata, Belal M.
    Al-Rababah, Saif A.
    Alsmadi, Izzat M.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2015, 27 (02) : 94 - 103
  • [5] Arabic Light Stemming: A Comparative Study between P-Stemmer, Khoja Stemmer, and Light10 Stemmer
    Kanan, Tarek
    Sadaqa, Odai
    Almhirat, Ashraf
    Kanan, Emran
    [J]. 2019 SIXTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2019, : 511 - 515
  • [6] Arabic light-based stemming: a comparative study among ligh10 stemmer, P-stemmer, and Conditional light stemmer
    Hussien, Sabria Mohammed
    Aburagheef, Hazim J.
    [J]. PROCEEDING OF 2021 2ND INFORMATION TECHNOLOGY TO ENHANCE E-LEARNING AND OTHER APPLICATION (IT-ELA 2021), 2021, : 131 - 135
  • [7] A New Enhanced Arabic Light Stemmer for IR in Medical Documents
    Al-Khatib, Ra'ed M.
    Zerrouki, Taha
    Abu Shquier, Mohammed M.
    Balla, Amar
    Al-Khateeb, Asef
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (01): : 1255 - 1269
  • [8] Arabic light-based stemmer using new rules
    Alshalabi, Hamood
    Tiun, Sabrina
    Omar, Nazlia
    AL-Aswadi, Fatima N.
    Alezabi, Kamal Ali
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) : 6635 - 6642
  • [9] Proposition of Improvement Areas in most Heavy an Light Stemmer Algorithms Novel Stemmer : EST.Stemmer
    El Manssouri, Hanane
    Farrah, Soufiane
    Ziyati, El Housssaine
    Ouzzif, Mohammed
    [J]. 2016 4TH IEEE INTERNATIONAL COLLOQUIUM ON INFORMATION SCIENCE AND TECHNOLOGY (CIST), 2016, : 654 - 657
  • [10] P-Stemmer or NLTK Stemmer for Arabic Text Classification?
    Elbes, Mohammed
    Aldajah, Amal
    Sadaqa, Odai
    [J]. 2019 SIXTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2019, : 516 - 520