Comparative Document Summarisation via Classification

被引：0

作者：

Bista, Umanga ^{[1
,2
]}

Mathews, Alexander ^{[1
,2
]}

Shin, Minjeong ^{[1
,2
]}

Menon, Aditya Krishna ^{[1
,3
]}

Xie, Lexing ^{[1
,2
]}

机构：

[1] Australian Natl Univ, Canberra, ACT, Australia

[2] Data Decis CRC, Canberra, ACT, Australia

[3] Google Res, Canberra, ACT, Australia

来源：

THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper considers extractive summarisation in a comparative setting: given two or more document groups (e.g., separated by publication time), the goal is to select a small number of documents that are representative of each group, and also maximally distinguishable from other groups. We formulate a set of new objective functions for this problem that connect recent literature on document summarisation, interpretable machine learning, and data subset selection. In particular, by casting the problem as a binary classification amongst different groups, we derive objectives based on the notion of maximum mean discrepancy, as well as a simple yet effective gradient-based optimisation strategy. Our new formulation allows scalable evaluations of comparative summarisation as a classification task, both automatically and via crowd-sourcing. To this end, we evaluate comparative summarisation methods on a newly curated collection of controversial news topics over 13 months. We observe that gradient-based optimisation outperforms discrete and baseline approaches in 15 out of 24 different automatic evaluation settings. In crowd-sourced evaluations, summaries from gradient optimisation elicit 7% more accurate classification from human workers than discrete optimisation. Our result contrasts with recent literature on submodular data subset selection that favours discrete optimisation. We posit that our formulation of comparative summarisation will prove useful in a diverse range of use cases such as comparing content sources, authors, related topics, or distinct view points.

引用

页码：20 / 28

页数：9

共 50 条

[1] COMPARATIVE STUDY OF LONG DOCUMENT CLASSIFICATION
Wagh, Vedangi
Khandve, Snehal
Joshi, Isha
Wani, Apurva
Kale, Geetanjali
Joshi, Raviraj
[J]. 2021 IEEE REGION 10 CONFERENCE (TENCON 2021), 2021, : 732 - 737
[2] Exploring Clustering for Multi-document Arabic Summarisation
El-Haj, Mahmoud
Kruschwitz, Udo
Fox, Chris
[J]. INFORMATION RETRIEVAL TECHNOLOGY, 2011, 7097 : 550 - 561
[3] Web document summarisation: a task-oriented evaluation
White, R
Ruthven, I
Jose, JM
[J]. 12TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2001, : 951 - 955
[4] The Exploration of Knowledge-Preserving Prompts for Document Summarisation
Chen, Chen
Zhang, Wei Emma
Shakeri, Alireza Seyed
Fiza, Makhmoor
[J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[5] Comparative Summarisation of Rich Media Collections
Bista, Umanga
[J]. PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 812 - 813
[6] Identifying semantic equivalence for multi-document summarisation
Eamonn Newman
Joe Carthy
John Dunnion
Nicola Stokes
[J]. Artificial Intelligence Review, 2006, 25 : 55 - 65
[7] Identifying semantic equivalence for multi-document summarisation
Newman, Eamonn
Carthy, Joe
Dunnion, John
Stokes, Nicola
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2006, 25 (1-2) : 55 - 65
[8] A comparative study of citations and links in document classification
Couto, Thierson
Cristo, Marco
Goncalves, Marcos Andre
Calado, Pavel
Ziviani, Nivio
Moura, Edleno
Ribeiro-Neto, Berthier
[J]. OPENING INFORMATION HORIZONS, 2006, : 75 - +
[9] Document Classification via Nonlinear Metric Learning
Li, Xin
Bai, Yanqin
Zhou, Siyun
Li, Ying
[J]. NEURAL PROCESSING LETTERS, 2018, 48 (03) : 1335 - 1345
[10] Document Classification via Nonlinear Metric Learning
Xin Li
Yanqin Bai
Siyun Zhou
Ying Li
[J]. Neural Processing Letters, 2018, 48 : 1335 - 1345

← 1 2 3 4 5 →