Arabic light-based stemmer using new rules

被引:11
|
作者
Alshalabi, Hamood [1 ,2 ]
Tiun, Sabrina [1 ]
Omar, Nazlia [1 ]
AL-Aswadi, Fatima N. [3 ,4 ]
Alezabi, Kamal Ali [5 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, CAIT, Bangi 43600, Malaysia
[2] Sanaa Univ, Sanaa, Yemen
[3] Hodeidah Univ, Fac Comp Sci & Engn, Hodeidah, Yemen
[4] Univ Sains Malaysia, Sch Comp Sci, Gelugor 11800, Pulau Pinang, Malaysia
[5] UCSI Univ, Inst Comp Sci & Digital Innovat ICSDI, Kuala Lumpur, Malaysia
关键词
Arabic stemmer; Arabic light stemmer; Arabic information retrieval; Suffix and prefix stripping; Arabic corpus; ROOT;
D O I
10.1016/j.jksuci.2021.08.017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Superior stemming algorithms aid significantly in many natural language processing (NLP) applications such as information retrieval. Arabic light-based stemmer is one of the most important stemming algorithms. However, partially due to the highly inflected and complexity of Arabic language morphological structure, most of the existing Arabic light-based stemmer algorithms eliminate a few numbers of suffixes and prefixes or both in the process of recognising the infix patterns to determine roots. The elimination of suffixes and prefixes leads to many inefficient results. Hence, this study aims to develop an improved light-based algorithm of the Arabic stemmer by proposing an appropriate suffixes and prefixes list, which is supported by rules according to word length (without using a morpheme or patterns on a stem). Our improved Dlight Arabic stemmer focuses on determining and removing the infix patterns under many rules on length-words and according to a specific order of the stages of the stemming to extract the double, triple and quadruple roots from long and short Arabic words. To evaluate our proposed light-based Arabic stemmer, we compared our stemmer against existing Arabic stemmers, namely Light10, Condlight and ARLST. The experimental results showed the proposed Develop Arabic Light-Based Stemmer (Dlight) obtained the best performance with 68% of F-measure, while the other three Arabic stemmers yield slightly lower F-measure. Finally, establishing an appropriate list of suffixes and prefixes with word length rules to stem Arabic words can improve the performance of a light-based Arabic stemmer. (c) 2021 The Authors. Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页码:6635 / 6642
页数:8
相关论文
共 50 条
  • [1] Arabic light-based stemming: a comparative study among ligh10 stemmer, P-stemmer, and Conditional light stemmer
    Hussien, Sabria Mohammed
    Aburagheef, Hazim J.
    [J]. PROCEEDING OF 2021 2ND INFORMATION TECHNOLOGY TO ENHANCE E-LEARNING AND OTHER APPLICATION (IT-ELA 2021), 2021, : 131 - 135
  • [2] ARABIC LIGHT STEMMER (ARS)
    Al-Omari, Asma
    Abuata, Belal
    [J]. JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2014, 9 (06): : 702 - 716
  • [3] BPR algorithm: New broken plural rules for an Arabic stemmer
    Alshalabi, Hamood
    Tiun, Sabrina
    Omar, Nazlia
    Anaam, Elham abdulwahab
    Saif, Yazid
    [J]. EGYPTIAN INFORMATICS JOURNAL, 2022, 23 (03) : 363 - 371
  • [4] An Improved Arabic Light Stemmer
    Elrajubi, Osama Mohamed
    [J]. 2013 INTERNATIONAL CONFERENCE ON RESEARCH AND INNOVATION IN INFORMATION SYSTEMS (ICRIIS), 2013, : 33 - 38
  • [5] Conditional Arabic Light Stemmer: CondLight
    Al-Lahham, Yaser
    Matarneh, Khawlah
    Hassan, Mohammad
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (3A) : 559 - 564
  • [6] A New Enhanced Arabic Light Stemmer for IR in Medical Documents
    Al-Khatib, Ra'ed M.
    Zerrouki, Taha
    Abu Shquier, Mohammed M.
    Balla, Amar
    Al-Khateeb, Asef
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (01): : 1255 - 1269
  • [7] A novel robust Arabic light stemmer
    Abainia, Kheireddine
    Ouamour, Siham
    Sayoud, Halim
    [J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2017, 29 (03) : 557 - 573
  • [8] Tashaphyne0.4: a new arabic light stemmer based on rhyzome modeling approach
    Al-Khatib, Ra'ed M.
    Zerrouki, Taha
    Abu Shquier, Mohammed M.
    Balla, Amar
    [J]. INFORMATION RETRIEVAL JOURNAL, 2023, 26 (1-2):
  • [9] Tashaphyne0.4: a new arabic light stemmer based on rhyzome modeling approach
    Ra’ed M. Al-Khatib
    Taha Zerrouki
    Mohammed M. Abu Shquier
    Amar Balla
    [J]. Information Retrieval Journal, 2023, 26
  • [10] Arabic Light Stemming: A Comparative Study between P-Stemmer, Khoja Stemmer, and Light10 Stemmer
    Kanan, Tarek
    Sadaqa, Odai
    Almhirat, Ashraf
    Kanan, Emran
    [J]. 2019 SIXTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2019, : 511 - 515