CoMSum and SIBERT: A Dataset and Neural Model for Query-Based Multi-document Summarization

被引:6
|
作者
Kulkarni, Sayali [1 ]
Chammas, Sheide [1 ]
Zhu, Wan [1 ]
Sha, Fei [1 ]
Ie, Eugene [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
关键词
Extractive summarization; Abstractive summarization; Neural models; Transformers; Summarization dataset;
D O I
10.1007/978-3-030-86331-9_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document summarization compress source document (s) into succinct and information-preserving text. A variant of this is query-based multi-document summarization (qmps) that targets summaries to providing specific informational needs, contextualized to the query. However, the progress in this is hindered by limited availability to large-scale datasets. In this work, we make two contributions. First, we propose an approach for automatically generated dataset for both extractive and abstractive summaries and release a version publicly. Second, we design a neural model SIBERT for extractive summarization that exploits the hierarchical nature of the input. It also infuses queries to extract query-specific summaries. We evaluate this model on CoMSum dataset showing significant improvement in performance. This should provide a baseline and enable using CoMSum for future research on qMDS.
引用
收藏
页码:84 / 98
页数:15
相关论文
共 50 条
  • [1] Query-Based Automatic Multi-document Summarization Extraction Method for Web Pages
    He, Qi
    Hao, Hong-Wei
    Yin, Xu-Cheng
    PROCEEDINGS OF THE 2011 2ND INTERNATIONAL CONGRESS ON COMPUTER APPLICATIONS AND COMPUTATIONAL SCIENCE, VOL 1, 2012, 144 : 107 - 112
  • [2] A Graph Based Query Focused Multi-Document Summarization
    Balaji, J.
    Geetha, T.
    Parthasarathi, Ranjani
    INTERNATIONAL JOURNAL OF INTELLIGENT INFORMATION TECHNOLOGIES, 2014, 10 (01) : 16 - 41
  • [3] Query-based Multi-document Summarization using Non-negative Semantic Feature and NMF Clustering
    Park, Sun
    Cha, ByungRae
    NCM 2008: 4TH INTERNATIONAL CONFERENCE ON NETWORKED COMPUTING AND ADVANCED INFORMATION MANAGEMENT, VOL 2, PROCEEDINGS, 2008, : 609 - 614
  • [4] QuerySum: A Multi-Document Query-Focused Summarization Dataset Augmented with Similar Query Clusters
    Liu, Yushan
    Wang, Zili
    Yuan, Ruifeng
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 18725 - 18732
  • [5] Query-focused Multi-document Summarization Using Cloud Model
    Chen, Jinguang
    He, Tingting
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2011, 14 (03): : 951 - 956
  • [6] Document-Based HITS Model for Multi-document Summarization
    Wan, Xiaojun
    PRICAI 2008: TRENDS IN ARTIFICIAL INTELLIGENCE, 2008, 5351 : 454 - 465
  • [7] Query-Focused Multi-document Summarization Based on Concept Importance
    Zheng, Hai-Tao
    Guo, Ji-Min
    Jiang, Yong
    Xia, Shu-Tao
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT II, 2016, 9652 : 443 - 453
  • [8] Query-Focused Multi-document Summarization Survey
    Alanzi, Entesar
    Alballaa, Safa
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (06) : 822 - 833
  • [9] GameWikiSum: a Novel Large Multi-Document Summarization Dataset
    Antognini, Diego
    Faltings, Boi
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6645 - 6650
  • [10] Enhancing web search by using query-based clusters and multi-document summaries
    Qumsiyeh, Rani
    Ng, Yiu-Kai
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 47 (02) : 355 - 380