Neural Abstractive Summarization for Long Text and Multiple Tables

被引:0
|
作者
Liu, Shuaiqi [1 ]
Cao, Jiannong [1 ]
Deng, Zhongfen [2 ]
Zhao, Wenting [2 ]
Yang, Ruosong [1 ]
Wen, Zhiyuan [1 ]
Yu, Philip S. [2 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[2] Univ Illinois, Chicago, IL 60607 USA
关键词
Document summarization; natural language generation; natural language processing; text summarization;
D O I
10.1109/TKDE.2023.3324012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Abstractive summarization aims to generate a concise summary covering the input document's salient information. Within a report document, the salient information can be scattered in the textual and non-textual content. However, existing document summarization datasets and methods usually focus on the text and filter out the non-textual content. Missing tabular data can limit produced summaries' informativeness, especially when summaries require covering quantitative descriptions of critical metrics in tables. Existing datasets and methods cannot meet the requirements of summarizing long text and dozens of tables in each report document. To deal with the scarcity of available datasets, we propose FINDSum, the first large-scale dataset for long text and multi-table summarization. Built on 21,125 annual reports from 3,794 companies, FINDSum has two subsets for summarizing each company's results of operations and liquidity. Besides, we present four types of summarization methods to jointly consider text and table content when summarizing reports. Additionally, we propose a set of evaluation metrics to assess the usage of numerical information in produced summaries. Our summarization methods significantly outperform advanced baselines, which verifies the necessity of incorporating textual and tabular data when summarizing report documents. We also conduct extensive comparative experiments to identify vital model components and configurations that can improve summarization results.
引用
收藏
页码:2572 / 2586
页数:15
相关论文
共 50 条
  • [1] Variational Neural Decoder for Abstractive Text Summarization
    Zhao, Huan
    Cao, Jie
    Xu, Mingquan
    Lu, Jian
    [J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2020, 17 (02) : 537 - 552
  • [2] Integrating Extractive and Abstractive Models for Long Text Summarization
    Wang, Shuai
    Zhao, Xiang
    Li, Bo
    Ge, Bin
    Tang, Daquan
    [J]. 2017 IEEE 6TH INTERNATIONAL CONGRESS ON BIG DATA (BIGDATA CONGRESS 2017), 2017, : 305 - 312
  • [3] Keyphrase Guided Beam Search for Neural Abstractive Text Summarization
    Chen, Xuewen
    Li, Jinlong
    Wang, Haihan
    [J]. 2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [4] Neural Abstractive Text Summarization with Sequence-to-Sequence Models
    Shi, Tian
    Keneshloo, Yaser
    Ramakrishnan, Naren
    Reddy, Chandan K.
    [J]. ACM/IMS Transactions on Data Science, 2021, 2 (01):
  • [5] Abstractive text summarization for Hungarian
    Yang, Zijian Gyozo
    Agocs, Adam
    Kusper, Gabor
    Varadi, Tamas
    [J]. ANNALES MATHEMATICAE ET INFORMATICAE, 2021, 53 : 299 - 316
  • [6] A Survey on Abstractive Text Summarization
    Moratanch, N.
    Chitrakala, S.
    [J]. PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2016), 2016,
  • [7] An approach to Abstractive Text Summarization
    Huong Thanh Le
    Tien Manh Le
    [J]. 2013 INTERNATIONAL CONFERENCE OF SOFT COMPUTING AND PATTERN RECOGNITION (SOCPAR), 2013, : 371 - 376
  • [8] A Review on Neural network based Abstractive Text Summarization models
    Tandel, Jinal
    Mistree, Kinjal
    Shah, Parth
    [J]. 2019 IEEE 5TH INTERNATIONAL CONFERENCE FOR CONVERGENCE IN TECHNOLOGY (I2CT), 2019,
  • [9] Survey on Abstractive Text Summarization
    Raphal, Nithin
    Duwarah, Hemanta
    Daniel, Philemon
    [J]. PROCEEDINGS OF THE 2018 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION AND SIGNAL PROCESSING (ICCSP), 2018, : 513 - 517
  • [10] Abstractive Text Summarization Using Hybrid Technique of Summarization
    Liaqat, Muhammad Irfan
    Hamid, Isma
    Nawaz, Qamar
    Shafique, Nida
    [J]. 2022 14TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2022), 2022, : 141 - 144