Extractive summarization of documents with images based on multi-modal RNN

被引:20
|
作者
Chen, Jingqiang [1 ]
Hai Zhuge [2 ,3 ,4 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp Sci, Nanjing, Jiangsu, Peoples R China
[2] Guangzhou Univ, Guangzhou, Guangdong, Peoples R China
[3] Aston Univ, Int Res Network Cyber Phys Social Intelligence Co, Birmingham, W Midlands, England
[4] Chinese Acad Sci, Univ Chinese Acad Sci, ICT, KLIIP, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Summarization; Extractive summarization; Multi-modal summarization; RNN; Document summarization;
D O I
10.1016/j.future.2019.04.045
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Rapid growth of multi-modal documents containing images on the Internet expresses strong demand on multi-modal summarization. The challenge is to create a computing method that can uniformly process text and image. Deep learning provides basic models for meeting this challenge. This paper treats extractive multi-modal summarization as a classification problem and proposes a sentence-image classification method based on the multi-modal RNN model. Our method encodes words and sentences with the hierarchical RNN models and encodes the ordered image set with the CNN model and the RNN model, and then calculates the selection probability of sentences and the sentence-image alignment probability through a logistic classifier taking text coverage, text redundancy, image set coverage, and image set redundancy as features. Two methods are proposed to compute the image set redundancy feature by combining the important scores of sentences and the hidden sentence-image alignment. Experiments on the extended DailyMail corpora constructed by collecting images and captions from the Web show that our method outperforms 11 baseline text summarization methods and that adopting the two image-related features in the classification method can improve text summarization. Our method is able to mine the hidden sentence-image alignments and to create informative well-aligned multi-modal summaries. (C) 2019 Published by Elsevier B.V.
引用
收藏
页码:186 / 196
页数:11
相关论文
共 50 条
  • [1] Extractive Text-Image Summarization Using Multi-Modal RNN
    Chen, Jingqiang
    Hai Zhuge
    [J]. 2018 14TH INTERNATIONAL CONFERENCE ON SEMANTICS, KNOWLEDGE AND GRIDS (SKG), 2018, : 245 - 248
  • [2] Multi-modal browsing of images in Web documents
    Chen, F
    Gargi, U
    Niles, L
    Schütze, H
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL VI, 1999, 3651 : 122 - 133
  • [3] Multi-modal anchor adaptation learning for multi-modal summarization
    Chen, Zhongfeng
    Lu, Zhenyu
    Rong, Huan
    Zhao, Chuanjun
    Xu, Fan
    [J]. NEUROCOMPUTING, 2024, 570
  • [4] Multi-modal Video Summarization
    Huang, Jia-Hong
    [J]. ICMR 2024 - Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024, : 1214 - 1218
  • [5] Multi-modal Video Summarization
    Huang, Jia-Hong
    [J]. PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1214 - 1218
  • [6] A Survey on Multi-modal Summarization
    Jangra, Anubhav
    Mukherjee, Sourajit
    Jatowt, Adam
    Saha, Sriparna
    Hasanuzzaman, Mohammad
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (13S)
  • [7] Abstractive Text-Image Summarization Using Multi-Modal Attentional Hierarchical RNN
    Chen, Jingqiang
    Hai Zhuge
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4046 - 4056
  • [8] Metaknowledge Extraction Based on Multi-Modal Documents
    Liu, Shu-Kan
    Xu, Rui-Lin
    Geng, Bo-Ying
    Sun, Qiao
    Duan, Li
    Liu, Yi-Ming
    [J]. IEEE ACCESS, 2021, 9 : 50050 - 50060
  • [9] Multi-Modal Code Summarization with Retrieved Summary
    Lin, Lile
    Huang, Zhiqiu
    Yu, Yaoshen
    Liu, Yapeng
    [J]. 2022 IEEE 22ND INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2022), 2022, : 132 - 142
  • [10] Fostering multi-modal summarization for trend information
    Kato, Tsuneaki
    Matsushita, Mitsunori
    Kando, Noriko
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS: KES 2007 - WIRN 2007, PT II, PROCEEDINGS, 2007, 4693 : 377 - 386