P-Stemmer or NLTK Stemmer for Arabic Text Classification?

被引:0
|
作者
Elbes, Mohammed [1 ]
Aldajah, Amal [1 ]
Sadaqa, Odai [1 ]
机构
[1] Al Zaytoonah Univ Jordan, Comp Sci Dept, Amman, Jordan
关键词
Arabic Natural Language Processing (ANLP); P-Stemmer; Natural Language ToolKit (NLTK); Support Vector Machine (SVM); Naive Base (NB); ROBUST;
D O I
10.1109/snams.2019.8931818
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Natural Language Processing (NLP) is a branch of computer science that focuses on developing systems that allow computers to communicate with people using everyday language. NLP tools are Devoted to making computers understand statements written in human language. Indexing, text retrieval and word processing are considered as challenges in the classification process. Hence, Arabic Natural Language Processing ANLP tools are needed to achieve the aforementioned tasks. ANLP includes preprocessing such as Stemming, Normalization, Stopword Removal, Part of speech POS and other processes. In this work, we collected 1,000 news articles from Alghad.com newspaper, then we classified our dataset using SVM and NB algorithms using NLTK tool. We compared the results of two stemmers; P-Stemmer and NLTK stemmer using the mentioned classification process. The results of the classification for the P-Stemmer was better than the NLTK stemmer and for the two classifiers.
引用
收藏
页码:516 / 520
页数:5
相关论文
共 50 条
  • [1] Automated Arabic Text Classification With P-Stemmer, Machine Learning, and a Tailored News Article Taxonomy
    Kanan, Tarek
    Fox, Edward A.
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2016, 67 (11) : 2667 - 2683
  • [2] Arabic Light Stemming: A Comparative Study between P-Stemmer, Khoja Stemmer, and Light10 Stemmer
    Kanan, Tarek
    Sadaqa, Odai
    Almhirat, Ashraf
    Kanan, Emran
    [J]. 2019 SIXTH INTERNATIONAL CONFERENCE ON SOCIAL NETWORKS ANALYSIS, MANAGEMENT AND SECURITY (SNAMS), 2019, : 511 - 515
  • [3] Arabic light-based stemming: a comparative study among ligh10 stemmer, P-stemmer, and Conditional light stemmer
    Hussien, Sabria Mohammed
    Aburagheef, Hazim J.
    [J]. PROCEEDING OF 2021 2ND INFORMATION TECHNOLOGY TO ENHANCE E-LEARNING AND OTHER APPLICATION (IT-ELA 2021), 2021, : 131 - 135
  • [4] Impact of Stemmer on Arabic Text Retrieval
    Atwan, Jaffar
    Mohd, Masnizah
    Kanaan, Ghassan
    Bsoul, Qusay
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, AIRS 2014, 2014, 8870 : 314 - 326
  • [5] An evaluation of Reber stemmer with longest match stemmer technique in Kurdish Sorani text classification
    Ari M. Saeed
    Tarik A. Rashid
    Arazo M. Mustafa
    Rawan A. Al-Rashid Agha
    Ahmed S. Shamsaldin
    Nawzad K. Al-Salihi
    [J]. Iran Journal of Computer Science, 2018, 1 (2) : 99 - 107
  • [6] ARABIC LIGHT STEMMER (ARS)
    Al-Omari, Asma
    Abuata, Belal
    [J]. JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2014, 9 (06): : 702 - 716
  • [7] An Improved Arabic Light Stemmer
    Elrajubi, Osama Mohamed
    [J]. 2013 INTERNATIONAL CONFERENCE ON RESEARCH AND INNOVATION IN INFORMATION SYSTEMS (ICRIIS), 2013, : 33 - 38
  • [8] Arabic Stemmer Based Big Data
    Madani, Youness
    Erritali, Mohammed
    Bengourram, Jamaa
    [J]. JOURNAL OF ELECTRONIC COMMERCE IN ORGANIZATIONS, 2018, 16 (01) : 17 - 28
  • [9] Conditional Arabic Light Stemmer: CondLight
    Al-Lahham, Yaser
    Matarneh, Khawlah
    Hassan, Mohammad
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (3A) : 559 - 564
  • [10] A novel root based Arabic stemmer
    Al-Kabi, Mohammed N.
    Kazakzeh, Saif A.
    Abu Ata, Belal M.
    Al-Rababah, Saif A.
    Alsmadi, Izzat M.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2015, 27 (02) : 94 - 103