The impact of term-weighting schemes and similarity measures on extractive multi-document text summarization

被引:10
|
作者
Sanchez-Gomez, Jesus M. [1 ]
Vega-Rodriguez, Miguel A. [1 ]
Perez, Carlos J. [2 ]
机构
[1] Univ Extremadura, Dept Comp & Commun Technol, Campus Univ S-N, Caceres 10003, Spain
[2] Univ Extremadura, Dept Math, Campus Univ S-N, Caceres 10003, Spain
关键词
Multi-document summarization; Extractive summary; Term-weighting; Similarity;
D O I
10.1016/j.eswa.2020.114510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text summarization is currently a topic of great interest in many knowledge fields. Extractive multi document text summarization methods aim to reduce the textual information from a document collection by covering the main content and reducing the redundant information. In the scientific literature, there are different approaches related to term-weighting schemes and similarity measures, which are necessary for implementing an automatic summary system. However, to the best of the authors' knowledge, there are no studies to analyze the performance of the different schemes and measures. In this paper, all possible combinations of the most common term-weighting schemes and similarity measures used in the extractive multi-document text summarization field have been implemented, compared, and analyzed. Experiments have been performed with Document Understanding Conferences (DUC) datasets, and the model performance has been assessed with eight Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics and the execution time. Results show that the best term weighting scheme is the term-frequency inverse-sentence-frequency scheme, and the best similarity measure is the cosine similarity. Even more, the combination formed by both of them has obtained the best average results in 87.5% of ROUGE scores compared to the other combinations.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Decomposition-based multi-objective differential evolution for extractive multi-document automatic text summarization
    Wahab, Muhammad Hafizul Hazmi
    Hamid, Nor Asilah Wati Abdul
    Subramaniam, Shamala
    Latip, Rohaya
    Othman, Mohamed
    [J]. APPLIED SOFT COMPUTING, 2024, 151
  • [42] An Indicator-based Multi-Objective Optimization Approach Applied to Extractive Multi-Document Text Summarization
    Sanchez-Gomez, J.
    Vega-Rodriguez, M.
    Perez, C.
    [J]. IEEE LATIN AMERICA TRANSACTIONS, 2019, 17 (08) : 1291 - 1299
  • [43] An Approach for Combining Multiple Weighting Schemes and Ranking Methods in Graph-Based Multi-Document Summarization
    Alzuhair, Abeer
    Al-Dhelaan, Mohammed
    [J]. IEEE ACCESS, 2019, 7 : 120375 - 120386
  • [44] Mining Topically Coherent Patterns for Unsupervised Extractive Multi-document Summarization
    Wu, Yutong
    Li, Yuefeng
    Xu, Yue
    Huang, Wei
    [J]. 2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2016), 2016, : 129 - 136
  • [45] Grapharizer: A Graph-Based Technique for Extractive Multi-Document Summarization
    Jalil, Zakia
    Nasir, Muhammad
    Alazab, Moutaz
    Nasir, Jamal
    Amjad, Tehmina
    Alqammaz, Abdullah
    [J]. ELECTRONICS, 2023, 12 (08)
  • [46] Extractive multi-document summarization using relative redundancy and coherence scores
    Akhtar, Nadeem
    Beg, M. M. Sufyan
    Hussain, Md. Muzakkir
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (05) : 6201 - 6210
  • [47] A Fuzzy-Rough Hybrid Approach to Multi-document Extractive Summarization
    Huang, Hsun-Hui
    Yang, Horng-Chang
    Kuo, Yau-Hwang
    [J]. HIS 2009: 2009 NINTH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS, VOL 1, PROCEEDINGS, 2009, : 168 - +
  • [48] Unsupervised Framework for Comment-based Multi-document Extractive Summarization
    Roha, Vishal Singh
    Saini, Naveen
    Saha, Sriparna
    Moreno, Jose G.
    [J]. PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'22), 2022, : 574 - 582
  • [49] An Optimization Algorithm for Extractive Multi-document Summarization Based on Association of Sentences
    Chen, Chun-Hao
    Yang, Yi-Chen
    Lin, Jerry Chun-Wei
    [J]. ADVANCES AND TRENDS IN ARTIFICIAL INTELLIGENCE: THEORY AND PRACTICES IN ARTIFICIAL INTELLIGENCE, 2022, 13343 : 460 - 469
  • [50] An Extractive Multi-Document Summarization Technique Based on Fuzzy Logic approach
    Tsoumou, Evrard Stency Larys
    Yang, Shichong
    Lai, Linjing
    Varus, Mbembo Loundou
    [J]. 2016 INTERNATIONAL CONFERENCE ON NETWORK AND INFORMATION SYSTEMS FOR COMPUTERS (ICNISC), 2016, : 346 - 351