Comparative Document Summarisation via Classification

被引:0
|
作者
Bista, Umanga [1 ,2 ]
Mathews, Alexander [1 ,2 ]
Shin, Minjeong [1 ,2 ]
Menon, Aditya Krishna [1 ,3 ]
Xie, Lexing [1 ,2 ]
机构
[1] Australian Natl Univ, Canberra, ACT, Australia
[2] Data Decis CRC, Canberra, ACT, Australia
[3] Google Res, Canberra, ACT, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper considers extractive summarisation in a comparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summarisation, interpretable machine learning, and data subset selection. In particular, by casting the problem as a binary classification amongst different groups, we derive objectives based on the notion of maximum mean discrepancy, as well as a simple yet effective gradient-based optimisation strategy. Our new formulation allows scalable evaluations of comparative summarisation as a classification task, both automatically and via crowd-sourcing. To this end, we evaluate comparative summarisation methods on a newly curated collection of controversial news topics over 13 months. We observe that gradient-based optimisation outperforms discrete and baseline approaches in 15 out of 24 different automatic evaluation settings. In crowd-sourced evaluations, summaries from gradient optimisation elicit 7% more accurate classification from human workers than discrete optimisation. Our result contrasts with recent literature on submodular data subset selection that favours discrete optimisation. We posit that our formulation of comparative summarisation will prove useful in a diverse range of use cases such as comparing content sources, authors, related topics, or distinct view points.
引用
收藏
页码:20 / 28
页数:9
相关论文
共 50 条
  • [1] COMPARATIVE STUDY OF LONG DOCUMENT CLASSIFICATION
    Wagh, Vedangi
    Khandve, Snehal
    Joshi, Isha
    Wani, Apurva
    Kale, Geetanjali
    Joshi, Raviraj
    [J]. 2021 IEEE REGION 10 CONFERENCE (TENCON 2021), 2021, : 732 - 737
  • [2] Exploring Clustering for Multi-document Arabic Summarisation
    El-Haj, Mahmoud
    Kruschwitz, Udo
    Fox, Chris
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, 2011, 7097 : 550 - 561
  • [3] Web document summarisation: a task-oriented evaluation
    White, R
    Ruthven, I
    Jose, JM
    [J]. 12TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2001, : 951 - 955
  • [4] The Exploration of Knowledge-Preserving Prompts for Document Summarisation
    Chen, Chen
    Zhang, Wei Emma
    Shakeri, Alireza Seyed
    Fiza, Makhmoor
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [5] Comparative Summarisation of Rich Media Collections
    Bista, Umanga
    [J]. PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 812 - 813
  • [6] Identifying semantic equivalence for multi-document summarisation
    Eamonn Newman
    Joe Carthy
    John Dunnion
    Nicola Stokes
    [J]. Artificial Intelligence Review, 2006, 25 : 55 - 65
  • [7] Identifying semantic equivalence for multi-document summarisation
    Newman, Eamonn
    Carthy, Joe
    Dunnion, John
    Stokes, Nicola
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2006, 25 (1-2) : 55 - 65
  • [8] A comparative study of citations and links in document classification
    Couto, Thierson
    Cristo, Marco
    Goncalves, Marcos Andre
    Calado, Pavel
    Ziviani, Nivio
    Moura, Edleno
    Ribeiro-Neto, Berthier
    [J]. OPENING INFORMATION HORIZONS, 2006, : 75 - +
  • [9] Document Classification via Nonlinear Metric Learning
    Li, Xin
    Bai, Yanqin
    Zhou, Siyun
    Li, Ying
    [J]. NEURAL PROCESSING LETTERS, 2018, 48 (03) : 1335 - 1345
  • [10] Document Classification via Nonlinear Metric Learning
    Xin Li
    Yanqin Bai
    Siyun Zhou
    Ying Li
    [J]. Neural Processing Letters, 2018, 48 : 1335 - 1345