CoMSum and SIBERT: A Dataset and Neural Model for Query-Based Multi-document Summarization

被引:6
|
作者
Kulkarni, Sayali [1 ]
Chammas, Sheide [1 ]
Zhu, Wan [1 ]
Sha, Fei [1 ]
Ie, Eugene [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
关键词
Extractive summarization; Abstractive summarization; Neural models; Transformers; Summarization dataset;
D O I
10.1007/978-3-030-86331-9_6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Document summarization compress source document (s) into succinct and information-preserving text. A variant of this is query-based multi-document summarization (qmps) that targets summaries to providing specific informational needs, contextualized to the query. However, the progress in this is hindered by limited availability to large-scale datasets. In this work, we make two contributions. First, we propose an approach for automatically generated dataset for both extractive and abstractive summaries and release a version publicly. Second, we design a neural model SIBERT for extractive summarization that exploits the hierarchical nature of the input. It also infuses queries to extract query-specific summaries. We evaluate this model on CoMSum dataset showing significant improvement in performance. This should provide a baseline and enable using CoMSum for future research on qMDS.
引用
收藏
页码:84 / 98
页数:15
相关论文
共 50 条
  • [31] Multi-document Summarization by Creating Synthetic Document Vector Based on Language Model
    Kim, Dahae
    Lee, Jee-Hyoung
    2016 JOINT 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 17TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2016, : 605 - 609
  • [32] Co-HITS-Ranking Based Query-Focused Multi-document Summarization
    Hu, Po
    Ji, Donghong
    Teng, Chong
    INFORMATION RETRIEVAL TECHNOLOGY, 2010, 6458 : 121 - 130
  • [33] Query-focused multi-document summarization using hypergraph-based ranking
    Xiong, Shufeng
    Ji, Donghong
    INFORMATION PROCESSING & MANAGEMENT, 2016, 52 (04) : 670 - 681
  • [34] Multi-document summarization based on lexical chains
    Chen, YM
    Wang, XL
    Liu, BQ
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 1937 - 1942
  • [35] Genetic algorithm based multi-document summarization
    Liu, Dexi
    He, Yanxiang
    Ji, Donghong
    Yang, Hua
    PRICAI 2006: TRENDS IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, 4099 : 1140 - 1144
  • [36] Multi-document summarization based on unsupervised clustering
    Ji, Paul
    INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS, 2006, 4182 : 560 - 566
  • [37] Geodesic Distance based Multi-document Summarization
    Ma, Huifang
    He, Qing
    Shi, Zhongzhi
    IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 54 - 59
  • [38] Applying regression models to query-focused multi-document summarization
    Ouyang, You
    Li, Wenjie
    Li, Sujian
    Lu, Qin
    INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (02) : 227 - 237
  • [39] Data Augmentation for Abstractive Query-Focused Multi-Document Summarization
    Pasunuru, Ramakanth
    Celikyilmaz, Asli
    Galley, Michel
    Xiong, Chenyan
    Zhang, Yizhe
    Bansal, Mohit
    Gao, Jianfeng
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 13666 - 13674
  • [40] SUBTOPIC-BASED MULTI-DOCUMENT SUMMARIZATION
    Dai, Lin
    Tang, Ji-Liang
    Xia, Yun-Qing
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 3505 - +