Combining Word and Character Embeddings for Arabic Chatbots

被引:1
|
作者
Bensalah, Nouhaila [1 ]
Ayad, Habib [1 ]
Adib, Abdellah [1 ]
Ibn el Farouk, Abdelhamid [2 ]
机构
[1] Univ Hassan II Casablanca, Team Networks Telecoms & Multimedia, Casablanca 20000, Morocco
[2] Teaching Languages & Cultures Lab Mohammedia, Mohammadia, Morocco
关键词
Word-character embeddings; Word2Vec; FastText; Arabic chatbot; CNN; GRU; LSTM;
D O I
10.1007/978-3-030-90633-7_48
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Arabic language has a rich morphology and structure with a diverse vocabulary and rarely used words. Consequently, most Arabic Natural Language Processing (NLP) tasks could benefit from embedding models that do not assign a distinct vector to each unique word in the used vocabulary but instead focus on the internal structure of words. The semantic meaning of a word is related to the meaning of its composing characters which contain rich internal information. In this paper, we propose a new embedding model using two levels of granularity; words and characters. Moreover, we describe the details of generating an Arabic word embeddings using Word2Vec and FastText models. Furthermore, a Deep Learning (DL) architecture will be applied to the top of the word-character embeddings. Experimental results show that the proposed scheme outperforms the state-of-the-art methods proposed for Arabic chatbots.
引用
收藏
页码:571 / 578
页数:8
相关论文
共 50 条
  • [1] Combining Character and Word Embeddings for Affect in Arabic Informal Social Media Microblogs
    Alharbi, Abdullah I.
    Lee, Mark
    [J]. NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2020), 2020, 12089 : 213 - 224
  • [2] Word Embeddings for Arabic Sentiment Analysis
    Altowayan, A. Aziz
    Tao, Lixin
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 3820 - 3825
  • [3] The Impact of Arabic Diacritization on Word Embeddings
    Abbache, Mohamed
    Abbache, Ahmed
    Xu, Jingwen
    Meziane, Farid
    Wen, Xianbin
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [4] Methodical Evaluation of Arabic Word Embeddings
    Elrazzaz, Mohammed
    Elbassuoni, Shady
    Shaban, Khaled
    Helwe, Chadi
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 454 - 458
  • [5] Joint Learning of Character and Word Embeddings
    Chen, Xinxiong
    Xu, Lei
    Liu, Zhiyuan
    Sun, Maosong
    Luan, Huanbo
    [J]. PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1236 - 1242
  • [6] CHARACTER AND WORD EMBEDDINGS FOR PHISHING EMAIL DETECTION
    Stevanovic, Nikola
    [J]. COMPUTING AND INFORMATICS, 2022, 41 (05) : 1337 - 1357
  • [7] Multiple Character Embeddings for Chinese Word Segmentation
    Wang, Jingkang
    Zhou, Jianing
    Zhou, Jie
    Liu, Gongshen
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 210 - 216
  • [8] Combining a Novel Scoring Approach with Arabic Stemming Techniques for Arabic Chatbots Conversation Engine
    Alshammari, Nasser O.
    Alharbi, Fawaz D.
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2022, 21 (04)
  • [9] The impact of using pre-trained word embeddings in Sinhala chatbots
    Gamage, Bimsara
    Pushpananda, Randil
    Weerasinghe, Ruvan
    [J]. 2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 161 - 165
  • [10] Evaluation of Stacked Embeddings for Arabic Word Sense Disambiguation
    Laatar, Rim
    Aloulou, Chafik
    Belguith, Lamia Hadrich
    [J]. COMPUTACION Y SISTEMAS, 2023, 27 (02): : 379 - 388