Arabic light-based stemmer using new rules

被引:11
|
作者
Alshalabi, Hamood [1 ,2 ]
Tiun, Sabrina [1 ]
Omar, Nazlia [1 ]
AL-Aswadi, Fatima N. [3 ,4 ]
Alezabi, Kamal Ali [5 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, CAIT, Bangi 43600, Malaysia
[2] Sanaa Univ, Sanaa, Yemen
[3] Hodeidah Univ, Fac Comp Sci & Engn, Hodeidah, Yemen
[4] Univ Sains Malaysia, Sch Comp Sci, Gelugor 11800, Pulau Pinang, Malaysia
[5] UCSI Univ, Inst Comp Sci & Digital Innovat ICSDI, Kuala Lumpur, Malaysia
关键词
Arabic stemmer; Arabic light stemmer; Arabic information retrieval; Suffix and prefix stripping; Arabic corpus; ROOT;
D O I
10.1016/j.jksuci.2021.08.017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Superior stemming algorithms aid significantly in many natural language processing (NLP) applications such as information retrieval. Arabic light-based stemmer is one of the most important stemming algorithms. However, partially due to the highly inflected and complexity of Arabic language morphological structure, most of the existing Arabic light-based stemmer algorithms eliminate a few numbers of suffixes and prefixes or both in the process of recognising the infix patterns to determine roots. The elimination of suffixes and prefixes leads to many inefficient results. Hence, this study aims to develop an improved light-based algorithm of the Arabic stemmer by proposing an appropriate suffixes and prefixes list, which is supported by rules according to word length (without using a morpheme or patterns on a stem). Our improved Dlight Arabic stemmer focuses on determining and removing the infix patterns under many rules on length-words and according to a specific order of the stages of the stemming to extract the double, triple and quadruple roots from long and short Arabic words. To evaluate our proposed light-based Arabic stemmer, we compared our stemmer against existing Arabic stemmers, namely Light10, Condlight and ARLST. The experimental results showed the proposed Develop Arabic Light-Based Stemmer (Dlight) obtained the best performance with 68% of F-measure, while the other three Arabic stemmers yield slightly lower F-measure. Finally, establishing an appropriate list of suffixes and prefixes with word length rules to stem Arabic words can improve the performance of a light-based Arabic stemmer. (c) 2021 The Authors. Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页码:6635 / 6642
页数:8
相关论文
共 50 条
  • [31] A rule-based extensible stemmer for information retrieval with application to Arabic
    Harmanani, HM
    Keirouz, WT
    Raheel, S
    [J]. Proceedings of the Eighth IASTED International Conference on Artificial Intelligence and Soft Computing, 2004, : 35 - 40
  • [32] LIGHT-BASED DETECTION OF BIOMOLECULES
    DURRANT, I
    [J]. NATURE, 1990, 346 (6281) : 297 - 298
  • [33] Development and analysis of a new light-based hydrogen production system
    Ghosh, Sayantan
    Dincer, Ibrahim
    [J]. INTERNATIONAL JOURNAL OF HYDROGEN ENERGY, 2016, 41 (19) : 7976 - 7986
  • [34] New opportunities for light-based tumor treatment with an “iron fist”
    Riccardo Marin
    Erving Ximendes
    Daniel Jaque
    [J]. Light: Science & Applications, 11
  • [35] New opportunities for light-based tumor treatment with an iron fist
    Marin, Riccardo
    Ximendes, Erving
    Jaque, Daniel
    [J]. Light: Science and Applications, 2022, 11 (01):
  • [36] New opportunities for light-based tumor treatment with an "iron fist"
    Marin, Riccardo
    Ximendes, Erving
    Jaque, Daniel
    [J]. LIGHT-SCIENCE & APPLICATIONS, 2022, 11 (01)
  • [37] Breakthrough in biotechnology: A new light-based gene promoter system
    Harbaugh, RE
    [J]. NEUROSURGERY, 2002, 51 (06)
  • [38] Quantitative Pneumatic Otoscopy Using a Light-Based Ranging Technique
    Ryan L . Shelton
    Ryan M. Nolan
    Guillermo L. Monroy
    Paritosh Pande
    Michael A. Novak
    Ryan G. Porter
    Stephen A. Boppart
    [J]. Journal of the Association for Research in Otolaryngology, 2017, 18 : 555 - 568
  • [39] Quantitative Pneumatic Otoscopy Using a Light-Based Ranging Technique
    Shelton, Ryan L.
    Nolan, Ryan M.
    Monroy, Guillermo L.
    Pande, Paritosh
    Novak, Michael A.
    Porter, Ryan G.
    Boppart, Stephen A.
    [J]. JARO-JOURNAL OF THE ASSOCIATION FOR RESEARCH IN OTOLARYNGOLOGY, 2017, 18 (04): : 555 - 568
  • [40] Commercializing light-based technologies
    Holton, W. Conard
    [J]. LASER FOCUS WORLD, 2014, 50 (08): : 7 - 7