Extractive Text Summarization Models for Urdu Language

被引:17
|
作者
Nawaz, Ali [1 ]
Bakhtyar, Maheen [1 ]
Baber, Junaid [1 ]
Ullah, Ihsan [1 ]
Noor, Waheed [1 ]
Basit, Abdul [1 ]
机构
[1] Univ Balochistan Quetta, Quetta, Pakistan
关键词
Natural Language Processing; Sentence Weight Algorithm; Text Summarization; Urdu Language; Weighted Term Frequency;
D O I
10.1016/j.ipm.2020.102383
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the recent few years, a lot of advancement has been made in Urdu linguistics. There are many portals and news websites that are generating a huge amount of data every day. However, there is still no publicly available dataset nor any framework available for automatic Urdu extractive summary generation. In an automatic extractive summary generation, the sentences with the highest weights are given importance to be included in the summary. The sentence weight is computed by the sum of the weights of the words in the sentence. There are two famous approaches to compute the weight of the words in the English language: local weights (LW) approach and global weights (GW) approach. The sensitivity of the weights depends on the contents of the text, the one word may have different weights in a different article, known as LW based approach. Whereas, in the case of GW, the weights of the words are computed from the independent dataset, which implies the weights of all words remain the same in different articles. In the proposed framework, LW and GW based approaches are modeled for the Urdu language. The sentence weight method and the weighted term-frequency method are LW based approaches that compute the weights of the sentences by the sum of important words and the sum of frequencies of the important words, respectively. Whereas, vector space model (VSM) is GW based approach, that computes the weight of the words from the independent dataset, and then remain the same for all types of the text; GW is widely used in the English language for various applications such as information retrieval and text classification. The extractive summaries are generated by LW and GW based approaches and evaluated with ground-truth summaries that are obtained by the experts. The VSM is used as a baseline framework for sentence weighting. Experiments show that LW based approaches are better for extractive summary generation. The F-score of the sentence weight method and the weighted term-frequency method are 80% and 76%, respectively. The VSM achieved only 62% accuracy on the same dataset. Both, the datasets with ground-truth, and the code are made publicly available for the researchers.
引用
下载
收藏
页数:14
相关论文
共 50 条
  • [21] Analyzing Preprocessing Settings for Urdu Single-document Extractive Summarization
    Humayoun, Muhammad
    Yu, Hwanjo
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3686 - 3693
  • [22] Extractive Text Summarization Using Topological Features
    Kumar, Ankit
    Sarkar, Apurba
    COMBINATORIAL IMAGE ANALYSIS, IWCIA 2022, 2023, 13348 : 105 - 121
  • [23] An Improved Evolutionary Algorithm for Extractive Text Summarization
    Abuobieda, Albaraa
    Salim, Naomie
    Kumar, Yogan Jaya
    Osman, Ahmed Hamza
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS (ACIIDS 2013), PT II, 2013, 7803 : 78 - 89
  • [24] DocEng'19 Competition on Extractive Text Summarization
    Lins, Rafael Dueire
    Mello, Rafael Ferreira
    Simske, Steve
    DOCENG'19: PROCEEDINGS OF THE ACM SYMPOSIUM ON DOCUMENT ENGINEERING 2019, 2019,
  • [25] A Novel Approach for Semantic Extractive Text Summarization
    Waseemullah
    Fatima, Zainab
    Zardari, Shehnila
    Fahim, Muhammad
    Andleeb Siddiqui, Maria
    Ibrahim, Ag. Asri Ag.
    Nisar, Kashif
    Naz, Laviza Falak
    APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [26] Language-independent extractive automatic text summarization based on automatic keyword extraction
    Hernandez-Castaneda, Angel
    Arnulfo Garcia-Hernandez, Rene
    Ledeneva, Yulia
    Eduardo Millan-Hernandez, Christian
    COMPUTER SPEECH AND LANGUAGE, 2022, 71
  • [27] Neural Extractive Text Summarization with Syntactic Compression
    Xu, Jiacheng
    Durrett, Greg
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3292 - 3303
  • [28] Comparative Study of Extractive Text Summarization Techniques
    Palliyali, Ahammed Waseem
    Al-Khalifa, Maaz Abdulaziz
    Farooq, Saad
    Abinahed, Julien
    Al-Ansari, Abdulla
    Jaoua, Ali
    2021 IEEE/ACS 18TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2021,
  • [29] An Improvised Extractive Approach to Hindi Text Summarization
    Kumar, K. Vimal
    Yadav, Divakar
    INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, 2015, 339 : 291 - 300
  • [30] Extractive Text Summarization via Graph Entropy
    Hark, Cengiz
    Uckan, Taner
    Seyyarer, Ebubekir
    Karci, Ali
    2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,