The impact of term-weighting schemes and similarity measures on extractive multi-document text summarization

被引:10
|
作者
Sanchez-Gomez, Jesus M. [1 ]
Vega-Rodriguez, Miguel A. [1 ]
Perez, Carlos J. [2 ]
机构
[1] Univ Extremadura, Dept Comp & Commun Technol, Campus Univ S-N, Caceres 10003, Spain
[2] Univ Extremadura, Dept Math, Campus Univ S-N, Caceres 10003, Spain
关键词
Multi-document summarization; Extractive summary; Term-weighting; Similarity;
D O I
10.1016/j.eswa.2020.114510
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic text summarization is currently a topic of great interest in many knowledge fields. Extractive multi document text summarization methods aim to reduce the textual information from a document collection by covering the main content and reducing the redundant information. In the scientific literature, there are different approaches related to term-weighting schemes and similarity measures, which are necessary for implementing an automatic summary system. However, to the best of the authors' knowledge, there are no studies to analyze the performance of the different schemes and measures. In this paper, all possible combinations of the most common term-weighting schemes and similarity measures used in the extractive multi-document text summarization field have been implemented, compared, and analyzed. Experiments have been performed with Document Understanding Conferences (DUC) datasets, and the model performance has been assessed with eight Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics and the execution time. Results show that the best term weighting scheme is the term-frequency inverse-sentence-frequency scheme, and the best similarity measure is the cosine similarity. Even more, the combination formed by both of them has obtained the best average results in 87.5% of ROUGE scores compared to the other combinations.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Extractive Multi-document Text Summarization Leveraging Hybrid Semantic Similarity Measures
    Bandaru, Rajesh
    Radhika, Dr. Y.
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (09) : 844 - 852
  • [2] Survey on Extractive Text Summarization Methods with Multi-Document Datasets
    Varalakshmi, P. N. K.
    Kallimani, Jagadish S.
    [J]. 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 2113 - 2119
  • [3] Multi-document extractive text summarization based on firefly algorithm
    Tomer, Minakshi
    Kumar, Manoj
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (08) : 6057 - 6065
  • [4] Multi-document extractive text summarization: A comparative assessment on features
    Mutlu, Begum
    Sezer, Ebru A.
    Akcayol, M. Ali
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 183
  • [5] Unsupervised extractive multi-document text summarization using a Genetic Algorithm
    Neri-Mendoza, Veronica
    Ledeneva, Yulia
    Garcia-Hernandez, Rene Arnulfo
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) : 2397 - 2408
  • [6] Extractive multi-document text summarization based on graph independent sets
    Uckan, Taner
    Karci, Ali
    [J]. EGYPTIAN INFORMATICS JOURNAL, 2020, 21 (03) : 145 - 157
  • [7] Multi-Document Extractive Text Summarization via Deep Learning Approach
    Rezaei, Afsaneh
    Dami, Sina
    Daneshjoo, Parisa
    [J]. 2019 IEEE 5TH CONFERENCE ON KNOWLEDGE BASED ENGINEERING AND INNOVATION (KBEI 2019), 2019, : 680 - 685
  • [8] Experimental analysis of multiple criteria for extractive multi-document text summarization
    Sanchez-Gomez, Jesus M.
    Vega-Rodriguez, Miguel A.
    Perez, Carlos J.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 140
  • [9] Improving the Similarity Measure of Determinantal Point Processes for Extractive Multi-Document Summarization
    Cho, Sangwoo
    Lebanoff, Logan
    Foroosh, Hassan
    Liu, Fei
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 1027 - 1038
  • [10] Extractive Multi-Document Text Summarization by Using Binary Particle Swarm Optimization
    Potnurwar, Archana
    Pimpalshende, Anjusha
    Aote, Shailendra S.
    Bongirwar, Vrusbali
    [J]. BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2020, 13 (14): : 32 - 34