A BERT-GRU Model for Measuring the Similarity of Arabic Text

被引:0
|
作者
Saidi, Rakia [1 ]
Jarray, Fethi [2 ]
Schwab, Didier [3 ]
机构
[1] UTM Univ, LIMTIC Lab, Tunis, Tunisia
[2] Gabes Univ, ISI Medenine, Medenine, Tunisia
[3] Univ Grenoble Alpes, LIG Lab, Grenoble, France
关键词
Semantic Similarity; Cross-Encoder; Data augmentation; Arabic text; GRU; BERT; Backtranslation;
D O I
10.3897/jucs.111217
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Semantic Textual Similarity (STS) aims to assess the semantic similarity between two pieces of text. As a challenging task in natural language processing, various approaches for STS in high-resource languages, such as English, have been proposed. In this paper, we are concerned with STS in low resource languages such as Arabic. A baseline approach for STS is based on vector embedding of the input text and application of similarity metric on the embedding space. In this contribution, we propose a cross-encoder neural network (Cross-BERT-GRU) to handle semantic similarity of Arabic sentences that benefits from both the strong contextual understanding of BERT and the sequential modeling capabilities of GRU. The architecture begins by inputting the BERT word embeddings for each word into a GRU cell to model long-term dependencies. Then, max pooling and average pooling are applied to the hidden outputs of the GRU cell, serving as the sentence-pair encoder. Finally, a softmax layer is utilized to predict the degree of similarity. The experiment results show a Spearman correlation coefficient of around 0 . 9 and that CrossBERT-GRU outperforms the other BERT models in predicting the semantic textual similarity of Arabic sentences. The experimentation results also indicate that the performance improves by integrating data augmentation techniques.
引用
收藏
页码:779 / 790
页数:12
相关论文
共 50 条
  • [1] BERT Models for Arabic Text Classification: A Systematic Review
    Alammary, Ali Saleh
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (11):
  • [2] A New Alignment Word-Space Approach for Measuring Semantic Similarity for Arabic Text
    Ismail, Shimaa
    Shishtawy, Tarek E. L.
    Alsammak, Abdelwahab Kamel
    [J]. INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2022, 18 (01)
  • [3] From Text to Insight: An Integrated CNN-BiLSTM-GRU Model for Arabic Cyberbullying Detection
    Daraghmi, Eman-Yaser
    Qadan, Sajida
    Daraghmi, Yousef-Awwad
    Yousuf, Rami
    Cheikhrouhou, Omar
    Baz, Mohammed
    [J]. IEEE ACCESS, 2024, 12 : 103504 - 103519
  • [4] Arabic Sentiment Analysis Using BERT Model
    Chouikhi, Hasna
    Chniter, Hamza
    Jarray, Fethi
    [J]. ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 1463 : 621 - 632
  • [5] BERT-Based Joint Model for Aspect Term Extraction and Aspect Polarity Detection in Arabic Text
    Chouikhi, Hasna
    Alsuhaibani, Mohammed
    Jarray, Fethi
    [J]. ELECTRONICS, 2023, 12 (03)
  • [6] NGram Approach for Semantic Similarity on Arabic Short Text
    Al-Mahmoud, Rana Husni
    Sharieh, Ahmad
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (11) : 857 - 866
  • [7] A Text Semantic Similarity Approach for Arabic Paraphrase Detection
    Mahmoud, Adnen
    Zrigui, Ahmed
    Zrigui, Mounir
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 338 - 349
  • [8] TAQS: An Arabic Question Similarity System Using Transfer Learning of BERT With BiLSTM
    Alshammari, Waad
    AlHumoud, Sarah
    [J]. IEEE ACCESS, 2022, 10 : 91509 - 91523
  • [9] Development of Optimized Linguistic Technique Using Similarity Score on BERT Model in Summarizing Hindi Text Documents
    Rajeshwari, S. B.
    Kallimani, Jagadish S.
    [J]. INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 767 - 781
  • [10] AWSS: An Algorithm for Measuring Arabic Word Semantic Similarity
    Almarsoomi, Faaza A.
    O'Shea, James D.
    Bandar, Zuhair
    Crockett, Keeley
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 504 - 509