An Entailment-based Scoring Method for Content Selection in Document Summarization

被引:2
|
作者
Dang Hoang Long [1 ]
Minh-Tien Nguyen [2 ]
Ngo Xuan Bach [1 ]
Le-Minh Nguyen [3 ]
Tu Minh Phuong [1 ]
机构
[1] Posts & Telecommun Inst Technol, Hanoi, Vietnam
[2] Hung Yen Univ Technol & Educ, Hung Yen, Vietnam
[3] Japan Adv Inst Sci & Technol, 1-8 Asahidai, Nomi, Ishikawa, Japan
关键词
Web Document Summarization; Entailment; Sentence Scoring; Integer Linear Programming (ILP);
D O I
10.1145/3287921.3287976
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper introduces a scoring method to improve the quality of content selection in an extractive summarization system. Different from previous models mainly using local information inside sentences such as sentence position or sentence length, our method judges the importance of a sentence based on its own information and the relation between sentences. For the relation between sentences, we utilize textual entailment, a relationship indicating that the meaning of a sentence can be inferred from another one. Unlike previous work on using textual entailment for summarization, we go a step further by looking at aligned words in an entailment sentence pair. Assuming that important words in a salient sentence can be aligned by several words in other sentences, word alignment scores are exploited to compute the entailment score of a sentence. To take advantage of local and neighbor information for facilitating the salient estimation of sentences, we combine entailment scores with sentence position scores. We validate the proposed scoring method with greedy or integer linear programming approaches for extracting summaries. Experiments on three datasets (including DUC 2001 and 2002) in two different domains show that our model obtains competitive ROUGE-scores with state-of-the-art methods for single-document summarization.
引用
收藏
页码:122 / 129
页数:8
相关论文
共 50 条
  • [21] Extractive multi-document summarization based on textual entailment and sentence compression via knapsack problem
    Naserasadi, Ali
    Khosravi, Hamid
    Sadeghi, Faramarz
    NATURAL LANGUAGE ENGINEERING, 2019, 25 (01) : 121 - 146
  • [22] Exploring content selection strategies for Multilingual Multi-Document Summarization based on the Universal Network Language (UNL)
    Chaud, Matheus Rigobelo
    Di Felippo, Ariani
    REVISTA DE ESTUDOS DA LINGUAGEM, 2018, 26 (01) : 45 - 71
  • [23] Incorporating Textual Entailment Recognition in Single- and Multi-Document Summarization Systems
    Lloret, Elena
    Ferrandez, Oscar
    Munoz, Rafael
    Palomar, Manuel
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2008, (41): : 183 - 190
  • [24] A Scoring Model Assisted by Frequency for Multi-Document Summarization
    Yu, Yue
    Wu, Mutong
    Su, Weifeng
    Cheung, Yiu-ming
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 309 - 320
  • [25] A progressive sentence selection strategy for document summarization
    Ouyang, You
    Li, Wenjie
    Zhang, Renxian
    Li, Sujian
    Lu, Qin
    INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (01) : 213 - 221
  • [26] Optimizing Sentence Modeling and Selection for Document Summarization
    Yin, Wenpeng
    Pei, Yulong
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1383 - 1389
  • [27] The Pyramid Method: Incorporating human content selection variation in summarization evaluation
    Nenkova, Ani
    Passonneau, Rebecca
    Mckeown, Kathleen
    ACM Transactions on Speech and Language Processing, 2007, 4 (02):
  • [28] Generalised Zero-shot Learning for Entailment-based Text Classification with Externa Knowledge
    Wang, Yuqi
    Wang, Wei
    Chen, Qi
    Huang, Kaizhu
    Anh Nguyen
    De, Suparna
    2022 IEEE INTERNATIONAL CONFERENCE ON SMART COMPUTING (SMARTCOMP 2022), 2022, : 19 - 25
  • [29] Cross-document Structure Theory (CST) Content Selection Strategies for Multidocument Automatic Summarization
    Jorge, Maria Lucia del Rosario Castro
    Salgueiro Pardo, Thiago Alexandre
    LINGUAMATICA, 2010, 2 (01): : 95 - 109
  • [30] Building a Textual Entailment Suite for the Evaluation of Automatic Content Scoring Technologies
    Sukkarieh, Jana Z.
    Bolge, Eleanor
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3149 - 3156