A rule-based stemmer for Arabic Gulf dialect

被引:21
|
作者
Abuata, Belal [1 ]
Al-Omari, Asma [1 ]
机构
[1] Yarmouk Univ, Fac IT & Comp Sci, Irbid 21163, Jordan
关键词
Arabic dialect stemmer; Gulf dialect; Rule base stemming; Arabic NLP;
D O I
10.1016/j.jksuci.2014.04.003
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic dialects arewidely used from many years ago instead of Modern Standard Arabic language in many fields. The presence of dialects in any language is a big challenge. Dialects add a new set of variational dimensions in some fields like natural language processing, information retrieval and even in Arabic chatting between different Arab nationals. Spoken dialects have no standard morphological, phonological and lexical like Modern Standard Arabic. Hence, the objective of this paper is to describe a procedure or algorithm by which a stem for the Arabian Gulf dialect can be defined. The algorithm is rule based. Special rules are created to remove the suffixes and prefixes of the dialect words. Also, the algorithm applies rules related to the word size and the relation between adjacent letters. The algorithm was tested for a number of words and given a good correct stem ratio. The algorithm is also compared with two Modern Standard Arabic algorithms. The results showed that Modern Standard Arabic stemmers performed poorly with Arabic Gulf dialect and our algorithm performed poorly when applied for Modern Standard Arabic words. Crown Copyright (C) 2015 Production and hosting by Elsevier B.V.
引用
收藏
页码:104 / 112
页数:9
相关论文
共 50 条
  • [1] Towards Improving Khoja Rule-Based Arabic Stemmer
    Al-Kabi, Mohammed N.
    [J]. 2013 IEEE JORDAN CONFERENCE ON APPLIED ELECTRICAL ENGINEERING AND COMPUTING TECHNOLOGIES (AEECT), 2013,
  • [2] A Rule-Based Subject-Correlated Arabic Stemmer
    El-Defrawy, Mahmoud
    El-Sonbaty, Yasser
    Belal, Nahla A.
    [J]. ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2016, 41 (08) : 2883 - 2891
  • [3] A Rule-Based Subject-Correlated Arabic Stemmer
    Mahmoud El-Defrawy
    Yasser El-Sonbaty
    Nahla A. Belal
    [J]. Arabian Journal for Science and Engineering, 2016, 41 : 2883 - 2891
  • [4] A rule-based extensible stemmer for information retrieval with application to Arabic
    Harmanani, HM
    Keirouz, WT
    Raheel, S
    [J]. Proceedings of the Eighth IASTED International Conference on Artificial Intelligence and Soft Computing, 2004, : 35 - 40
  • [5] Rule-Based Arabic Stemmer as an R package: arStemmer1
    Hasan, Alshahrani A.
    Fong, Alvis C.
    Fatimah, Alshahrani
    [J]. 2019 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2019, : 438 - 442
  • [6] The Rule-Based Sundanese Stemmer
    Suryani, Arie Ardiyanti
    Widyantoro, Dwi Hendratmo
    Purwarianti, Ayu
    Sudaryat, Yayat
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2018, 17 (04)
  • [7] Building an Effective Rule-Based Light Stemmer for Arabic Language to Improve Search Effectiveness
    Ababneh, Mohamad
    Al-Shalabi, Riyad
    Kanaan, Ghassan
    Al-Nobani, Alaa
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2012, 9 (04) : 368 - 372
  • [8] Building an Effective Rule-Based Light Stemmer for Arabic Language to Improve Search Effectiveness
    Kanaan, Ghassan
    Al-Shalabi, Riyad
    Ababneh, Mohamad
    Al-Nobani, Alaa
    [J]. IIT: 2008 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY, 2008, : 292 - +
  • [9] SUSTEM: An Improved Rule-based Sundanese Stemmer
    Setiawan, Irwan
    Kao, Hung-Yu
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (06)
  • [10] MarS: A Rule-based Stemmer for Morphologically Rich Language Marathi
    Patil, Harshali B.
    Patil, Ajay S.
    [J]. 2017 INTERNATIONAL CONFERENCE ON COMPUTER, COMMUNICATIONS AND ELECTRONICS (COMPTELIX), 2017, : 580 - 584