End to End Urdu Abstractive Text Summarization With Dataset and Improvement in Evaluation Metric

被引:0
|
作者
Raza, Hassan [1 ]
Shahzad, Waseem [1 ]
机构
[1] Natl Univ Comp & Emerging Sci, FAST Sch Comp, Islamabad 44000, Pakistan
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Semantics; Transformers; Supervised learning; Databases; Artificial neural networks; Standards; Data mining; Data models; Neural networks; Text categorization; Natural language processing; Performance evaluation; Datasets; neural networks; CA-RoBERTa score; text summarization;
D O I
10.1109/ACCESS.2024.3377463
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Urdu, being a common language in South Asia, has not received significant attention in terms of language processing compared to more advanced languages. In the field of Natural Language Processing (NLP), the task of text summarization holds great importance due to its ability to comprehend textual content and generate concise summaries. Text summarization can be either extractive or abstractive in nature. While considerable efforts have been made to advance extractive summarization techniques, the limitations associated with it have been extensively explored and explained in the paper. However, the domain of abstractive summarization for the Urdu language remains largely unexplored. The challenges and underlying factors that have impeded progress in this domain have also been addressed. This paper specifically focuses on abstractive summarization of the Urdu language using supervised learning. To accomplish this, a labeled dataset consisting of Urdu text and its abstractive summaries is required. A dataset of Urdu text and its corresponding abstractive summaries has been prepared for the purpose of supervised learning. Additionally, the paper presents the results of summary generation, measured in terms of a rough score. Transformer's encoder-decoder network was employed to generate abstractive summaries in Urdu, yielding a ROUGE-1 score of 25.18 in Urdu text summarization. Moreover, a novel evaluation metric called the "disconnection rate" has been introduced as a context-aware evaluation metric to enhance the assessment of a summary, known as the Context Aware RoBERTa Score.
引用
收藏
页码:40311 / 40324
页数:14
相关论文
共 40 条
  • [1] Abstractive Text Summarization for the Urdu Language: Data and Methods
    Awais, Muhammad
    Muhammad Adeel Nawab, Rao
    [J]. IEEE ACCESS, 2024, 12 : 61198 - 61210
  • [2] Exploring Abstractive Text Summarization: Methods, Dataset, Evaluation, and Emerging Challenges
    Sunusi, Yusuf
    Omar, Nazlia
    Zakaria, Lailatul Qadri
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (07) : 1340 - 1357
  • [3] Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images
    Chandio, Asghar Ali
    Asikuzzamana, Md.
    Pickering, Mark
    Leghari, Mehwish
    [J]. DATA IN BRIEF, 2020, 31
  • [4] Abstractive text summarization using deep learning with a new Turkish summarization benchmark dataset
    Ertam, Fatih
    Aydin, Galip
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (09):
  • [5] CATAMARAN: A Cross-lingual Long Text Abstractive Summarization Dataset
    Chen, Zheng
    Lin, Hongyu
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6932 - 6937
  • [6] CLTS plus : A New Chinese Long Text Summarization Dataset with Abstractive Summaries
    Liu, Xiaojun
    Zang, Shunan
    Zhang, Chuang
    Chen, Xiaojun
    Ding, Yangyang
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2022, PT I, 2022, 13529 : 73 - 84
  • [7] EEM: An End-to-end Evaluation Metric for Scene Text Detection and Recognition
    Hao, Jiedong
    Wen, Yafei
    Deng, Jie
    Gan, Jun
    Ren, Shuai
    Tan, Hui
    Chen, Xiaoxin
    [J]. DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 95 - 108
  • [8] Sentence Pair Embeddings Based Evaluation Metric for Abstractive and Extractive Summarization
    Akula, Ramya
    Garibay, Ivan
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6009 - 6017
  • [9] Beyond ROUGE: A Comprehensive Evaluation Metric for Abstractive Summarization Leveraging Similarity, Entailment, and Acceptability
    Briman, Mohammed Khalid Hilmi
    Yildiz, Beytullah
    [J]. INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2024, 33 (05)
  • [10] A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification
    Ma, Shuming
    Sun, Xu
    Lin, Junyang
    Ren, Xuancheng
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 4251 - 4257