A BERT-GRU Model for Measuring the Similarity of Arabic Text

被引：0

作者：

Saidi, Rakia ^{[1
]}

Jarray, Fethi ^{[2
]}

Schwab, Didier ^{[3
]}

机构：

[1] UTM Univ, LIMTIC Lab, Tunis, Tunisia

[2] Gabes Univ, ISI Medenine, Medenine, Tunisia

[3] Univ Grenoble Alpes, LIG Lab, Grenoble, France

来源：

JOURNAL OF UNIVERSAL COMPUTER SCIENCE | 2024年 / 30卷 / 06期

关键词：

Semantic Similarity; Cross-Encoder; Data augmentation; Arabic text; GRU; BERT; Backtranslation;

D O I：

10.3897/jucs.111217

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Semantic Textual Similarity (STS) aims to assess the semantic similarity between two pieces of text. As a challenging task in natural language processing, various approaches for STS in high-resource languages, such as English, have been proposed. In this paper, we are concerned with STS in low resource languages such as Arabic. A baseline approach for STS is based on vector embedding of the input text and application of similarity metric on the embedding space. In this contribution, we propose a cross-encoder neural network (Cross-BERT-GRU) to handle semantic similarity of Arabic sentences that benefits from both the strong contextual understanding of BERT and the sequential modeling capabilities of GRU. The architecture begins by inputting the BERT word embeddings for each word into a GRU cell to model long-term dependencies. Then, max pooling and average pooling are applied to the hidden outputs of the GRU cell, serving as the sentence-pair encoder. Finally, a softmax layer is utilized to predict the degree of similarity. The experiment results show a Spearman correlation coefficient of around 0 . 9 and that CrossBERT-GRU outperforms the other BERT models in predicting the semantic textual similarity of Arabic sentences. The experimentation results also indicate that the performance improves by integrating data augmentation techniques.

引用

页码：779 / 790

页数：12

共 50 条

[1] BERT Models for Arabic Text Classification: A Systematic Review
Alammary, Ali Saleh
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (11):
[2] A New Alignment Word-Space Approach for Measuring Semantic Similarity for Arabic Text
Ismail, Shimaa
Shishtawy, Tarek E. L.
Alsammak, Abdelwahab Kamel
[J]. INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2022, 18 (01)
[3] From Text to Insight: An Integrated CNN-BiLSTM-GRU Model for Arabic Cyberbullying Detection
Daraghmi, Eman-Yaser
Qadan, Sajida
Daraghmi, Yousef-Awwad
Yousuf, Rami
Cheikhrouhou, Omar
Baz, Mohammed
[J]. IEEE ACCESS, 2024, 12 : 103504 - 103519
[4] Arabic Sentiment Analysis Using BERT Model
Chouikhi, Hasna
Chniter, Hamza
Jarray, Fethi
[J]. ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE (ICCCI 2021), 2021, 1463 : 621 - 632
[5] BERT-Based Joint Model for Aspect Term Extraction and Aspect Polarity Detection in Arabic Text
Chouikhi, Hasna
Alsuhaibani, Mohammed
Jarray, Fethi
[J]. ELECTRONICS, 2023, 12 (03)
[6] NGram Approach for Semantic Similarity on Arabic Short Text
Al-Mahmoud, Rana Husni
Sharieh, Ahmad
[J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (11) : 857 - 866
[7] A Text Semantic Similarity Approach for Arabic Paraphrase Detection
Mahmoud, Adnen
Zrigui, Ahmed
Zrigui, Mounir
[J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2017, PT II, 2018, 10762 : 338 - 349
[8] TAQS: An Arabic Question Similarity System Using Transfer Learning of BERT With BiLSTM
Alshammari, Waad
AlHumoud, Sarah
[J]. IEEE ACCESS, 2022, 10 : 91509 - 91523
[9] Development of Optimized Linguistic Technique Using Similarity Score on BERT Model in Summarizing Hindi Text Documents
Rajeshwari, S. B.
Kallimani, Jagadish S.
[J]. INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 767 - 781
[10] AWSS: An Algorithm for Measuring Arabic Word Semantic Similarity
Almarsoomi, Faaza A.
O'Shea, James D.
Bandar, Zuhair
Crockett, Keeley
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 504 - 509

← 1 2 3 4 5 →