Multimodal Summarization with Guidance of Multimodal Reference

被引:0
|
作者
Zhu, Junnan [1 ,2 ]
Zhou, Yu [1 ,2 ]
Zhang, Jiajun [1 ,2 ]
Li, Haoran [4 ]
Zong, Chengqing [1 ,2 ,3 ]
Li, Changliang [5 ]
机构
[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] CAS Ctr Excellence Brain Sci & Intelligence Techn, Beijing, Peoples R China
[4] JD AI Res, Huila, Colombia
[5] Kingsoft AI Lab, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal summarization with multimodal output (MSMO) is to generate a multimodal summary for a multimodal news report, which has been proven to effectively improve users' satisfaction. The existing MSMO methods are trained by the target of text modality, leading to the modality-bias problem that ignores the quality of model-selected image during training. To alleviate this problem, we propose a multimodal objective function with the guidance of multimodal reference to use the loss from the summary generation and the image selection. Due to the lack of multimodal reference data, we present two strategies, i.e., ROUGE-ranking and Order-ranking, to construct the multimodal reference by extending the text reference. Meanwhile, to better evaluate multimodal outputs, we propose a novel evaluation metric based on joint multimodal representation, projecting the model output and multimodal reference into a joint semantic space during evaluation. Experimental results have shown that our proposed model achieves the new state-of-the-art on both automatic and manual evaluation metrics. Besides, our proposed evaluation method can effectively improve the correlation with human judgments.
引用
收藏
页码:9749 / 9756
页数:8
相关论文
共 50 条
  • [1] MSMO: Multimodal Summarization with Multimodal Output
    Zhu, Junnan
    Li, Haoran
    Liu, Tianshang
    Zhou, Yu
    Zhang, Jiajun
    Zong, Chengqing
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4154 - 4164
  • [2] On Multimodal Microblog Summarization
    Saini, Naveen
    Saha, Sriparna
    Bhattacharyya, Pushpak
    Mrinal, Shubhankar
    Mishra, Santosh Kumar
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 9 (05) : 1317 - 1329
  • [3] Multimodal summarization of meeting recordings
    Erol, B
    Lee, DS
    Hull, J
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 25 - 28
  • [4] Multimodal news document summarization
    Javed, Hira
    Akhtar, Nadeem
    Beg, M. M. Sufyan
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2024, 45 (04): : 959 - 968
  • [5] Topic-guided abstractive multimodal summarization with multimodal output
    Rafi, Shaik
    Das, Ranjita
    NEURAL COMPUTING & APPLICATIONS, 2023,
  • [6] Graph-based Multimodal Ranking Models for Multimodal Summarization
    Zhu, Junnan
    Xiang, Lu
    Zhou, Yu
    Zhang, Jiajun
    Zong, Chengqing
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (04)
  • [7] Multimodal text summarization with evaluation approaches
    Khilji, Abdullah Faiz Ur Rahman
    Sinha, Utkarsh
    Singh, Pintu
    Ali, Adnan
    Laskar, Sahinur Rahman
    Dadure, Pankaj
    Manna, Riyanka
    Pakray, Partha
    Favre, Benoit
    Bandyopadhyay, Sivaji
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2023, 48 (04):
  • [8] Video Summarization Based on Multimodal Features
    Zhang, Yu
    Liu, Ju
    Liu, Xiaoxi
    Gao, Xuesong
    INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2020, 11 (04): : 60 - 76
  • [9] Leveraging multimodal content for podcast summarization
    Vaiani, Lorenzo
    La Quatra, Moreno
    Cagliero, Luca
    Garza, Paolo
    37TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2022, : 863 - 870
  • [10] Multimodal text summarization with evaluation approaches
    Abdullah Faiz Ur Rahman Khilji
    Utkarsh Sinha
    Pintu Singh
    Adnan Ali
    Sahinur Rahman Laskar
    Pankaj Dadure
    Riyanka Manna
    Partha Pakray
    Benoit Favre
    Sivaji Bandyopadhyay
    Sādhanā, 48