Extractive Text Summarization Models for Urdu Language

被引:17
|
作者
Nawaz, Ali [1 ]
Bakhtyar, Maheen [1 ]
Baber, Junaid [1 ]
Ullah, Ihsan [1 ]
Noor, Waheed [1 ]
Basit, Abdul [1 ]
机构
[1] Univ Balochistan Quetta, Quetta, Pakistan
关键词
Natural Language Processing; Sentence Weight Algorithm; Text Summarization; Urdu Language; Weighted Term Frequency;
D O I
10.1016/j.ipm.2020.102383
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the recent few years, a lot of advancement has been made in Urdu linguistics. There are many portals and news websites that are generating a huge amount of data every day. However, there is still no publicly available dataset nor any framework available for automatic Urdu extractive summary generation. In an automatic extractive summary generation, the sentences with the highest weights are given importance to be included in the summary. The sentence weight is computed by the sum of the weights of the words in the sentence. There are two famous approaches to compute the weight of the words in the English language: local weights (LW) approach and global weights (GW) approach. The sensitivity of the weights depends on the contents of the text, the one word may have different weights in a different article, known as LW based approach. Whereas, in the case of GW, the weights of the words are computed from the independent dataset, which implies the weights of all words remain the same in different articles. In the proposed framework, LW and GW based approaches are modeled for the Urdu language. The sentence weight method and the weighted term-frequency method are LW based approaches that compute the weights of the sentences by the sum of important words and the sum of frequencies of the important words, respectively. Whereas, vector space model (VSM) is GW based approach, that computes the weight of the words from the independent dataset, and then remain the same for all types of the text; GW is widely used in the English language for various applications such as information retrieval and text classification. The extractive summaries are generated by LW and GW based approaches and evaluated with ground-truth summaries that are obtained by the experts. The VSM is used as a baseline framework for sentence weighting. Experiments show that LW based approaches are better for extractive summary generation. The F-score of the sentence weight method and the weighted term-frequency method are 80% and 76%, respectively. The VSM achieved only 62% accuracy on the same dataset. Both, the datasets with ground-truth, and the code are made publicly available for the researchers.
引用
下载
收藏
页数:14
相关论文
共 50 条
  • [1] Abstractive Text Summarization for the Urdu Language: Data and Methods
    Awais, Muhammad
    Muhammad Adeel Nawab, Rao
    IEEE ACCESS, 2024, 12 : 61198 - 61210
  • [2] A comparative review of extractive text summarization in Indonesian language
    Widodo, W.
    Nugraheni, M.
    Sari, I. P.
    5TH ANNUAL APPLIED SCIENCE AND ENGINEERING CONFERENCE (AASEC 2020), 2021, 1098
  • [3] Integrating Extractive and Abstractive Models for Long Text Summarization
    Wang, Shuai
    Zhao, Xiang
    Li, Bo
    Ge, Bin
    Tang, Daquan
    2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 305 - 312
  • [4] A Statistical Language Modeling Framework for Extractive Summarization of Text Documents
    Gupta P.
    Nigam S.
    Singh R.
    SN Computer Science, 4 (6)
  • [5] Grouping sentences as better language unit for extractive text summarization
    Cao, Mengyun
    Zhuge, Hai
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 109 : 331 - 359
  • [6] Long-Span Language Models for Query-Focused Unsupervised Extractive Text Summarization
    Singh, Mittul
    Mishra, Arunav
    Oualil, Youssef
    Berberich, Klaus
    Klakow, Dietrich
    ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 657 - 664
  • [7] EUTS: Extractive Urdu Text Summarizer
    Jazeb, Noman
    Sikandar, A.
    Muhammad, Aslam
    Martinez-Enriquez, A. M.
    2018 17TH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (MICAI 2018), 2018, : 39 - 44
  • [8] Fairness of Extractive Text Summarization
    Shandilya, Anurag
    Ghosh, Kripabandhu
    Ghosh, Saptarshi
    COMPANION PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2018 (WWW 2018), 2018, : 97 - 98
  • [9] Deep Extractive Text Summarization
    Bhargava, Rupal
    Sharma, Yashvardhan
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 138 - 146
  • [10] A Review of the Extractive Text Summarization
    Mendoza Becerra, Martha Eliana
    Leon Guzman, Elizabeth
    UIS INGENIERIAS, 2013, 12 (01): : 7 - 27