Extractive Text Summarization Models for Urdu Language

被引:17
|
作者
Nawaz, Ali [1 ]
Bakhtyar, Maheen [1 ]
Baber, Junaid [1 ]
Ullah, Ihsan [1 ]
Noor, Waheed [1 ]
Basit, Abdul [1 ]
机构
[1] Univ Balochistan Quetta, Quetta, Pakistan
关键词
Natural Language Processing; Sentence Weight Algorithm; Text Summarization; Urdu Language; Weighted Term Frequency;
D O I
10.1016/j.ipm.2020.102383
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the recent few years, a lot of advancement has been made in Urdu linguistics. There are many portals and news websites that are generating a huge amount of data every day. However, there is still no publicly available dataset nor any framework available for automatic Urdu extractive summary generation. In an automatic extractive summary generation, the sentences with the highest weights are given importance to be included in the summary. The sentence weight is computed by the sum of the weights of the words in the sentence. There are two famous approaches to compute the weight of the words in the English language: local weights (LW) approach and global weights (GW) approach. The sensitivity of the weights depends on the contents of the text, the one word may have different weights in a different article, known as LW based approach. Whereas, in the case of GW, the weights of the words are computed from the independent dataset, which implies the weights of all words remain the same in different articles. In the proposed framework, LW and GW based approaches are modeled for the Urdu language. The sentence weight method and the weighted term-frequency method are LW based approaches that compute the weights of the sentences by the sum of important words and the sum of frequencies of the important words, respectively. Whereas, vector space model (VSM) is GW based approach, that computes the weight of the words from the independent dataset, and then remain the same for all types of the text; GW is widely used in the English language for various applications such as information retrieval and text classification. The extractive summaries are generated by LW and GW based approaches and evaluated with ground-truth summaries that are obtained by the experts. The VSM is used as a baseline framework for sentence weighting. Experiments show that LW based approaches are better for extractive summary generation. The F-score of the sentence weight method and the weighted term-frequency method are 80% and 76%, respectively. The VSM achieved only 62% accuracy on the same dataset. Both, the datasets with ground-truth, and the code are made publicly available for the researchers.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Pre-trained language models with domain knowledge for biomedical extractive summarization
    Xie Q.
    Bishop J.A.
    Tiwari P.
    Ananiadou S.
    Knowledge-Based Systems, 2022, 252
  • [42] Enhancing extractive text summarization using natural language processing with an optimal deep learning model
    Hassan, Abdulkhaleq Q. A.
    Al-onazi, Badriyya B.
    Maashi, Mashael
    Darem, Abdulbasit A.
    Abunadi, Ibrahim
    Mahmud, Ahmed
    AIMS MATHEMATICS, 2024, 9 (05): : 12588 - 12609
  • [43] Application of Extractive Text Summarization Algorithms to Speech-to-Text Media
    Victor, Dominguez M.
    Eduardo, Fidalgo F.
    Biswas, Rubel
    Alegre, Enrique
    Fernandez-Robles, Laura
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2019, 2019, 11734 : 540 - 550
  • [44] Graph-based extractive text summarization method for Hausa text
    Bichi, Abdulkadir Abubakar
    Samsudin, Ruhaidah
    Hassan, Rohayanti
    Hasan, Layla Rasheed Abdallah
    Rogo, Abubakar Ado
    PLOS ONE, 2023, 18 (05):
  • [45] Optimal Features Set For Extractive Automatic Text Summarization
    Meena, Yogesh Kumar
    Deolia, Peeyush
    Gopalani, Dinesh
    2015 5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING & COMMUNICATION TECHNOLOGIES ACCT 2015, 2015, : 35 - 40
  • [46] Extractive text summarization using deep learning approach
    Yadav A.K.
    Singh A.
    Dhiman M.
    Vineet
    Kaundal R.
    Verma A.
    Yadav D.
    International Journal of Information Technology, 2022, 14 (5) : 2407 - 2415
  • [47] A Comparative Analysis on Hindi and English Extractive Text Summarization
    Verma, Pradeepika
    Pal, Sukomal
    Om, Hari
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2019, 18 (03)
  • [48] Extractive Text Summarization using Word Vector Embedding
    Jain, Aditya
    Bhatia, Divij
    Thakur, Manish K.
    2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND DATA SCIENCE (MLDS 2017), 2017, : 51 - 55
  • [49] Query-Based Extractive Text Summarization for Sanskrit
    Barve, Siddhi
    Desai, Shaba
    Sardinha, Razia
    PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON FRONTIERS IN INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2015, 2016, 404 : 559 - 568
  • [50] Attentive Encoder-based Extractive Text Summarization
    Feng, Chong
    Cai, Fei
    Chen, Honghui
    de Rijke, Maarten
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1499 - 1502