Arabic abstractive text summarization using RNN-based and transformer-based architectures

被引:16
|
作者
Bani-Almarjeh, Mohammad [1 ]
Kurdy, Mohamad-Bassam [1 ,2 ,3 ]
机构
[1] Syrian Virtual Univ, Damascus, Syria
[2] ESC Rennes Sch Business, Rennes, France
[3] Burgundy Sch Business Dijon, Dijon, France
关键词
Natural language processing; Deep learning; Transfer learning; Text summarization;
D O I
10.1016/j.ipm.2022.103227
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, the Transformer model architecture and the pre-trained Transformer-based language models have shown impressive performance when used in solving both natural language un-derstanding and text generation tasks. Nevertheless, there is little research done on using these models for text generation in Arabic. This research aims at leveraging and comparing the per-formance of different model architectures, including RNN-based and Transformer-based ones, and different pre-trained language models, including mBERT, AraBERT, AraGPT2, and AraT5 for Arabic abstractive summarization. We first built an Arabic summarization dataset of 84,764 high-quality text-summary pairs. To use mBERT and AraBERT in the context of text summarization, we employed a BERT2BERT-based encoder-decoder model where we initialized both the encoder and decoder with the respective model weights. The proposed models have been tested using ROUGE metrics and manual human evaluation. We also compared their performance on out-of-domain data. Our pre-trained Transformer-based models give a large improvement in performance with-79% less data. We found that AraT5 scores-3 ROUGE higher than a BERT2BERT-based model that is initialized with AraBERT, indicating that an encoder-decoder pre-trained Transformer is more suitable for summarizing Arabic text. Also, both of these two models perform better than AraGPT2 by a clear margin, which we found to produce summaries with high readability but with relatively lesser quality. On the other hand, we found that both AraT5 and AraGPT2 are better at summarizing out-of-domain text. We released our models and dataset publicly1,.2
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Abstractive Arabic Text Summarization Based on Deep Learning
    Wazery, Y. M.
    Saleh, Marwa E.
    Alharbi, Abdullah
    Ali, Abdelmgeid A.
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [2] Automatic text summarization using transformer-based language models
    Rao, Ritika
    Sharma, Sourabh
    Malik, Nitin
    [J]. INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (06) : 2599 - 2605
  • [3] Applying Transformer-Based Text Summarization for Keyphrase Generation
    Glazkova A.V.
    Morozov D.A.
    [J]. Lobachevskii Journal of Mathematics, 2023, 44 (1) : 123 - 136
  • [4] An Improved Template Representation-based Transformer for Abstractive Text Summarization
    Sun, Jiaming
    Wang, Yunli
    Li, Zhoujun
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [5] RNN-based Text Summarization for Communication Cost Reduction: Toward a Semantic Communication
    Dam, Sumit Kumar
    Munir, Md. Shirajum
    Raha, Avi Deb
    Adhikary, Apurba
    Park, Seong-Bae
    Hong, Choong Seon
    [J]. 2023 INTERNATIONAL CONFERENCE ON INFORMATION NETWORKING, ICOIN, 2023, : 423 - 426
  • [6] Turkish abstractive text document summarization using text to text transfer transformer
    Ay, Betul
    Ertam, Fatih
    Fidan, Guven
    Aydin, Galip
    [J]. ALEXANDRIA ENGINEERING JOURNAL, 2023, 68 : 1 - 13
  • [7] A Two-Stage Transformer-Based Approach for Variable-Length Abstractive Summarization
    Su, Ming-Hsiang
    Wu, Chung-Hsien
    Cheng, Hao-Tse
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2061 - 2072
  • [8] A transformer-based approach for Arabic offline handwritten text recognition
    Momeni, Saleh
    Babaali, Bagher
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (04) : 3053 - 3062
  • [9] A transformer-based approach for Arabic offline handwritten text recognition
    Saleh Momeni
    Bagher BabaAli
    [J]. Signal, Image and Video Processing, 2024, 18 : 3053 - 3062
  • [10] English-Arabic Text Translation and Abstractive Summarization Using Transformers
    Holiel, Heidi Ahmed
    Mohamed, Nancy
    Ahmed, Arwa
    Medhat, Walaa
    [J]. 2023 20TH ACS/IEEE INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, AICCSA, 2023,