Masdar: A Novel Sequence-to-Sequence Deep Learning Model for Arabic Stemming

被引:1
|
作者
Fouad, Mohammed M. [1 ]
Mahany, Ahmed [2 ]
Katib, Iyad [3 ]
机构
[1] Fujitsu Technol Solut, Jeddah, Saudi Arabia
[2] Ain Shams Univ, Cairo, Egypt
[3] King Abdulaziz Univ, Jeddah, Saudi Arabia
关键词
Masdar; Natural Language Processing; Deep learning; Recurrent Neural Network; Sequence-to-Sequence; Arabic stemmer;
D O I
10.1007/978-3-030-29513-4_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Preprocessing the input textual data is the main starting step in any Natural Language Processing (NLP) application. Word stemming, i.e. extracting the stem or root of the input word, is a vital process within the preprocessing step. In this process, some words like "player", "playing", and "played" are mapped to their stem "play". In the English language, there are several algorithms and approaches that can be applied directly to handle this process. On the other hand, there are some trials for similar algorithms in Arabic, but all have weak performance due to the complexity of the language and the approaches used for building such algorithms. In this paper, we presented a novel deep learning-based model, called Masdar, for Arabic stemming. The proposed model leverages the power of the deep learning, especially the recurrent neural networks, in building an efficient Arabic stemmer that is capable of producing very accurate stems for most of the input words. Some experiments are conducted to compare the performance of the proposed model with the latest cited Arabic stemmers on a dataset of about 6000 Arabic word/stem pairs. The experimental results show that Masder outperformed the other stemmers. It can efficiently produce the correct stems with about 95% accuracy on the whole dataset and about 82% accuracy on the unseen test words.
引用
收藏
页码:363 / 373
页数:11
相关论文
共 50 条
  • [1] A Novel Deep-learning based Approach for Automatic Diacritization of Arabic Poems using Sequence-to-Sequence Model
    Mahmoud, Mohamed S.
    Negied, Nermin
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (01) : 42 - 46
  • [2] Deep Reinforcement Learning for Sequence-to-Sequence Models
    Keneshloo, Yaser
    Shi, Tian
    Ramakrishnan, Naren
    Reddy, Chandan K.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (07) : 2469 - 2489
  • [3] A novel sequence-to-sequence based deep learning model for satellite cloud image time series prediction
    Lian, Jie
    Wu, Shixin
    Huang, Sirong
    Zhao, Qin
    [J]. ATMOSPHERIC RESEARCH, 2024, 306
  • [4] Sequence-to-Sequence Deep Learning for Eye Movement Classification
    Startsev, Mikhail
    Agtzidis, Ioannis
    Dorr, Michael
    [J]. PERCEPTION, 2019, 48 : 200 - 200
  • [5] Forecasting of Patient-Specific Kidney Transplant Function With a Sequence-to-Sequence Deep Learning Model
    Van Loon, Elisabet
    Zhang, Wanqiu
    Coemans, Maarten
    De Vos, Maarten
    Emonds, Marie-Paule
    Scheffner, Irina
    Gwinner, Wilfried
    Kuypers, Dirk
    Senev, Aleksandar
    Tinel, Claire
    Van Craenenbroeck, Amaryllis H.
    De Moor, Bart
    Naesens, Maarten
    [J]. JAMA NETWORK OPEN, 2021, 4 (12)
  • [6] Time Series Forecasting using Sequence-to-Sequence Deep Learning Framework
    Du, Shengdong
    Li, Tianrui
    Horng, Shi-Jinn
    [J]. 2018 9TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES, ALGORITHMS AND PROGRAMMING (PAAP 2018), 2018, : 171 - 176
  • [7] Sequence-to-sequence deep learning model for building energy consumption prediction with dynamic simulation modeling
    Kim, Chul Ho
    Kim, Marie
    Song, Yu Jin
    [J]. JOURNAL OF BUILDING ENGINEERING, 2021, 43
  • [8] Semantic Matching for Sequence-to-Sequence Learning
    Zhang, Ruiyi
    Chen, Changyou
    Zhang, Xinyuan
    Bai, Ke
    Carin, Lawrence
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 212 - 222
  • [9] A deep trajectory clustering method based on sequence-to-sequence autoencoder model
    Wang, Chao
    Lyu, Fangzheng
    Wu, Sensen
    Wang, Yuanyuan
    Xu, Liuchang
    Zhang, Feng
    Wang, Shaowen
    Wang, Yongheng
    Du, Zhenhong
    [J]. TRANSACTIONS IN GIS, 2022, 26 (04) : 1801 - 1820
  • [10] A sequence-to-sequence based multi-scale deep learning model for satellite cloud image prediction
    Jie Lian
    Ruirong Chen
    [J]. Earth Science Informatics, 2023, 16 : 1207 - 1225