Arabic Light Stemming: A Comparative Study between P-Stemmer, Khoja Stemmer, and Light10 Stemmer

被引:0
|
作者
Kanan, Tarek [1 ]
Sadaqa, Odai [1 ]
Almhirat, Ashraf [1 ]
Kanan, Emran [2 ]
机构
[1] Al Zaytoonah Univ Jordan, Comp Sci Dept, Amman, Jordan
[2] Amman Arab Univ, Dept Comp Sci, Amman, Jordan
关键词
Arabic Language; Natural Language Processing; Text Classification; Arabic Stemming; TEXT; ROBUST;
D O I
10.1109/snams.2019.8931842
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Arabic is a derived language that has a deep structure and words meaning, one of the Arabic challenges is its morphology dependency. Arabic Natural Language Processing (ANLP) tools are required to achieve many tasks, such as Machine learning. For the text classification task, the ANLP is considered as preprocessing steps. These preprocessing steps include but not limited to Stemming, Normalization, and Stopwords Removal. In this work, we collected 2,000 news articles from Arabic online newspapers, the data were classified using Support Vector Machine (SVM) and Nave Base (NB) classifiers. The classification task was conducted for the purpose of comparing three different Arabic light stemmers; P-Stemmer, Khoja Stemmer, and Light10 Stemmer. The P-Stemmer results was dominating the other two stemmers in both SVM and NB classifiers with accuracy of 0.92 for F1-measure in SVM classifier and 0.90 for F1-Measure in NB classifier.
引用
收藏
页码:511 / 515
页数:5
相关论文
共 13 条
  • [1] Arabic light-based stemming: a comparative study among ligh10 stemmer, P-stemmer, and Conditional light stemmer
    Hussien, Sabria Mohammed
    Aburagheef, Hazim J.
    [J]. PROCEEDING OF 2021 2ND INFORMATION TECHNOLOGY TO ENHANCE E-LEARNING AND OTHER APPLICATION (IT-ELA 2021), 2021, : 131 - 135
  • [2] P-Stemmer or NLTK Stemmer for Arabic Text Classification?
    Elbes, Mohammed
    Aldajah, Amal
    Sadaqa, Odai
    [J]. 2019 SIXTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2019, : 516 - 520
  • [3] ARABIC LIGHT STEMMER (ARS)
    Al-Omari, Asma
    Abuata, Belal
    [J]. JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2014, 9 (06): : 702 - 716
  • [4] An Improved Arabic Light Stemmer
    Elrajubi, Osama Mohamed
    [J]. 2013 INTERNATIONAL CONFERENCE ON RESEARCH AND INNOVATION IN INFORMATION SYSTEMS (ICRIIS), 2013, : 33 - 38
  • [5] Conditional Arabic Light Stemmer: CondLight
    Al-Lahham, Yaser
    Matarneh, Khawlah
    Hassan, Mohammad
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (3A) : 559 - 564
  • [6] A novel robust Arabic light stemmer
    Abainia, Kheireddine
    Ouamour, Siham
    Sayoud, Halim
    [J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2017, 29 (03) : 557 - 573
  • [7] A New Enhanced Arabic Light Stemmer for IR in Medical Documents
    Al-Khatib, Ra'ed M.
    Zerrouki, Taha
    Abu Shquier, Mohammed M.
    Balla, Amar
    Al-Khateeb, Asef
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 68 (01): : 1255 - 1269
  • [8] Arabic light-based stemmer using new rules
    Alshalabi, Hamood
    Tiun, Sabrina
    Omar, Nazlia
    AL-Aswadi, Fatima N.
    Alezabi, Kamal Ali
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (09) : 6635 - 6642
  • [9] Automated Arabic Text Classification With P-Stemmer, Machine Learning, and a Tailored News Article Taxonomy
    Kanan, Tarek
    Fox, Edward A.
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (11) : 2667 - 2683
  • [10] Tashaphyne0.4: a new arabic light stemmer based on rhyzome modeling approach
    Al-Khatib, Ra'ed M.
    Zerrouki, Taha
    Abu Shquier, Mohammed M.
    Balla, Amar
    [J]. INFORMATION RETRIEVAL JOURNAL, 2023, 26 (1-2):