Abstractive Text Summarization for the Urdu Language: Data and Methods

被引:0
|
作者
Awais, Muhammad [1 ]
Muhammad Adeel Nawab, Rao [1 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Lahore Campus, Lahore 54000, Pakistan
关键词
Task analysis; Long short term memory; Deep learning; Benchmark testing; Social networking (online); Convolutional neural networks; Natural language processing; Abstracts; Text detection; Artificial intelligence; Publishing; Unsupervised learning; Machine learning; Text analysis; Text summarization; Abstractive text summarization; BART; corpus; deep learning models; GPT-3.5; large language models; Urdu;
D O I
10.1109/ACCESS.2024.3378300
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of abstractive text summarization aims to automatically generate a short and concise summary of a given source article. In recent years, automatic abstractive text summarization has attracted the attention of researchers because large volumes of digital text are readily available in multiple languages on a wide range of topics. Automatically generating precise summaries from large text has potential application in the generation of news headlines, a summary of research articles, the moral of the stories, media marketing, search engine optimization, financial research, social media marketing, question-answering systems, and chatbots. In literature, the problem of abstractive text summarization has been mainly investigated for English and some other languages. However, it has not been thoroughly explored for the Urdu language despite having a huge amount of data available in digital format. To fulfill this gap, this paper presents a large benchmark corpus of 2,067,784 Urdu news articles for the Urdu abstractive text summarization task. As a secondary contribution, we applied a range of deep learning (LSTM, Bi-LSTM, LSTM with attention, GRU, Bi-GRU, and GRU with attention), and large language models (BART and GPT-3.5) on our proposed corpus. Our extensive evaluation on 20,000 test instances showed that GRU with attention model outperforms the other models with ROUGE- 1 = 46.7 , ROUGE- 2 = 24.1 , and ROUGE-L = 48.7. To foster research in Urdu, our proposed corpus is publically and freely available for research purposes under the Creative Common Licence.
引用
收藏
页码:61198 / 61210
页数:13
相关论文
共 50 条
  • [1] Extractive Text Summarization Models for Urdu Language
    Nawaz, Ali
    Bakhtyar, Maheen
    Baber, Junaid
    Ullah, Ihsan
    Noor, Waheed
    Basit, Abdul
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
  • [2] End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric
    Raza, Hassan
    Shahzad, Waseem
    [J]. IEEE ACCESS, 2024, 12 : 40311 - 40324
  • [3] A Survey of Abstractive Text Summarization Utilising Pretrained Language Models
    Syed, Ayesha Ayub
    Gaol, Ford Lumban
    Boediman, Alfred
    Matsuo, Tokuro
    Budiharto, Widodo
    [J]. INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT I, 2022, 13757 : 532 - 544
  • [4] An approach to Abstractive Text Summarization
    Huong Thanh Le
    Tien Manh Le
    [J]. 2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 371 - 376
  • [5] A Survey on Abstractive Text Summarization
    Moratanch, N.
    Chitrakala, S.
    [J]. PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2016), 2016,
  • [6] Abstractive text summarization for Hungarian
    Yang, Zijian Gyozo
    Agocs, Adam
    Kusper, Gabor
    Varadi, Tamas
    [J]. ANNALES MATHEMATICAE ET INFORMATICAE, 2021, 53 : 299 - 316
  • [7] Survey on Abstractive Text Summarization
    Raphal, Nithin
    Duwarah, Hemanta
    Daniel, Philemon
    [J]. PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2018, : 513 - 517
  • [8] Two New Datasets for Italian-Language Abstractive Text Summarization
    Landro, Nicola
    Gallo, Ignazio
    La Grassa, Riccardo
    Federici, Edoardo
    [J]. INFORMATION, 2022, 13 (05)
  • [9] Abstractive Text Summarization based on Language Model Conditioning and Locality Modeling
    Aksenov, Dmitrii
    Moreno-Schneider, Julian
    Bourgonje, Peter
    Schwarzenberg, Robert
    Hennig, Leonhard
    Rehm, Georg
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6680 - 6689
  • [10] Abstractive Text Summarization Using Hybrid Technique of Summarization
    Liaqat, Muhammad Irfan
    Hamid, Isma
    Nawaz, Qamar
    Shafique, Nida
    [J]. 2022 14TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2022), 2022, : 141 - 144