Abstractive Text Summarization for the Urdu Language: Data and Methods

被引:0
|
作者
Awais, Muhammad [1 ]
Muhammad Adeel Nawab, Rao [1 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Lahore Campus, Lahore 54000, Pakistan
关键词
Task analysis; Long short term memory; Deep learning; Benchmark testing; Social networking (online); Convolutional neural networks; Natural language processing; Abstracts; Text detection; Artificial intelligence; Publishing; Unsupervised learning; Machine learning; Text analysis; Text summarization; Abstractive text summarization; BART; corpus; deep learning models; GPT-3.5; large language models; Urdu;
D O I
10.1109/ACCESS.2024.3378300
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of abstractive text summarization aims to automatically generate a short and concise summary of a given source article. In recent years, automatic abstractive text summarization has attracted the attention of researchers because large volumes of digital text are readily available in multiple languages on a wide range of topics. Automatically generating precise summaries from large text has potential application in the generation of news headlines, a summary of research articles, the moral of the stories, media marketing, search engine optimization, financial research, social media marketing, question-answering systems, and chatbots. In literature, the problem of abstractive text summarization has been mainly investigated for English and some other languages. However, it has not been thoroughly explored for the Urdu language despite having a huge amount of data available in digital format. To fulfill this gap, this paper presents a large benchmark corpus of 2,067,784 Urdu news articles for the Urdu abstractive text summarization task. As a secondary contribution, we applied a range of deep learning (LSTM, Bi-LSTM, LSTM with attention, GRU, Bi-GRU, and GRU with attention), and large language models (BART and GPT-3.5) on our proposed corpus. Our extensive evaluation on 20,000 test instances showed that GRU with attention model outperforms the other models with ROUGE- 1 = 46.7 , ROUGE- 2 = 24.1 , and ROUGE-L = 48.7. To foster research in Urdu, our proposed corpus is publically and freely available for research purposes under the Creative Common Licence.
引用
收藏
页码:61198 / 61210
页数:13
相关论文
共 50 条
  • [31] Abstractive Arabic Text Summarization Based on Deep Learning
    Wazery, Y. M.
    Saleh, Marwa E.
    Alharbi, Abdullah
    Ali, Abdelmgeid A.
    [J]. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
  • [32] Turkish abstractive text document summarization using text to text transfer transformer
    Ay, Betul
    Ertam, Fatih
    Fidan, Guven
    Aydin, Galip
    [J]. ALEXANDRIA ENGINEERING JOURNAL, 2023, 68 : 1 - 13
  • [33] Keyword-Aware Encoder for Abstractive Text Summarization
    Hu, Tianxiang
    Liang, Jingxi
    Ye, Wei
    Zhang, Shikun
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 37 - 52
  • [34] Semantic Graph Reduction Approach for Abstractive Text Summarization
    Moawad, Ibrahim F.
    Aref, Mostafa
    [J]. 2012 SEVENTH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES'2012), 2012, : 132 - 138
  • [35] Abstractive text summarization: State of the art, challenges, and improvements
    Shakil, Hassan
    Farooq, Ahmad
    Kalita, Jugal
    [J]. NEUROCOMPUTING, 2024, 603
  • [36] Abstractive Text Summarization with Application to Bulgarian News Articles
    Taushanov, Nikola
    Koychev, Ivan
    Nakov, Preslav
    [J]. PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE COMPUTATIONAL LINGUISTICS IN BULGARIA (CLIB '18), 2018, : 15 - 22
  • [37] Integrating Extractive and Abstractive Models for Long Text Summarization
    Wang, Shuai
    Zhao, Xiang
    Li, Bo
    Ge, Bin
    Tang, Daquan
    [J]. 2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 305 - 312
  • [38] Sentence Similarity Measurement for Bengali Abstractive Text Summarization
    Masum, Abu Kaisar Mohammad
    Abujar, Sheikh
    Tusher, Raja Tariqul Hasan
    Faisal, Fahad
    Hossain, Syed Akhter
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [39] Abstractive method of text summarization with sequence to sequence RNNs
    Masum, Abu Kaisar Mohammad
    Abujar, Sheikh
    Talukder, Md Ashraful Islam
    Rabby, A. K. M. Shahariar Azad
    Hossain, Syed Akhter
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [40] A Survey of Extractive and Abstractive Automatic Text Summarization Techniques
    Dalal, Vipul
    Malik, Latesh
    [J]. 2013 Sixth International Conference on Emerging Trends in Engineering and Technology (ICETET 2013), 2013, : 109 - 110