Align and Attend: Multimodal Summarization with Dual Contrastive Losses

被引:12
|
作者
He, Bo [1 ]
Wang, Jun [1 ]
Qiu, Jielin [2 ]
Bui, Trung [3 ]
Shrivastava, Abhinav [1 ]
Wang, Zhaowen [3 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] Carnegie Mellon Univ, Pittsburgh, PA USA
[3] Adobe Res, San Francisco, CA USA
关键词
D O I
10.1109/CVPR52729.2023.01428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The goal of multimodal summarization is to extract the most important information from different modalities to form summaries. Unlike unimodal summarization, the multimodal summarization task explicitly leverages cross-modal information to help generate more reliable and high-quality summaries. However, existing methods fail to leverage the temporal correspondence between different modalities and ignore the intrinsic correlation between different samples. To address this issue, we introduce Align and Attend Multimodal Summarization (A2Summ), a unified multimodal transformer-based model which can effectively align and attend the multimodal input. In addition, we propose two novel contrastive losses to model both inter-sample and intra-sample correlations. Extensive experiments on two standard video summarization datasets (TVSum and SumMe) and two multimodal summarization datasets (Daily Mail and CNN) demonstrate the superiority of A2Summ, achieving state-of-the-art performances on all datasets. Moreover, we collected a large-scale multimodal summarization dataset BLiSS, which contains livestream videos and transcribed texts with annotated summaries. Our code and dataset are publicly available at https://boheumd.github.io/A2Summ/.
引用
收藏
页码:14867 / 14878
页数:12
相关论文
共 50 条
  • [1] Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
    Pang, Zongshang
    Nakashima, Yuta
    Otani, Mayu
    Nagahara, Hajime
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 2009 - 2018
  • [2] ICAF: Iterative Contrastive Alignment Framework for Multimodal Abstractive Summarization
    Zhang, Zijian
    Shu, Chang
    Chen, Youxin
    Xiao, Jing
    Zhang, Qian
    Zheng, Lu
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [3] Dual-Level Contrastive Learning for Improving Conciseness of Summarization
    Peng, Wei
    Zhang, Han
    Jiang, Dan
    Xiao, Kejing
    Li, Yuxuan
    IEEE ACCESS, 2024, 12 : 65630 - 65639
  • [4] Inter- and Intra-Modal Contrastive Hybrid Learning Framework for Multimodal Abstractive Summarization
    Li, Jiangfeng
    Zhang, Zijian
    Wang, Bowen
    Zhao, Qinpei
    Zhang, Chenxi
    ENTROPY, 2022, 24 (06)
  • [5] Contrastive text summarization: a survey
    Stroehle, Thomas
    Campos, Ricardo
    Jatowt, Adam
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2023, 18 (04) : 353 - 367
  • [6] MSMO: Multimodal Summarization with Multimodal Output
    Zhu, Junnan
    Li, Haoran
    Liu, Tianshang
    Zhou, Yu
    Zhang, Jiajun
    Zong, Chengqing
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4154 - 4164
  • [7] Multimodal Summarization with Guidance of Multimodal Reference
    Zhu, Junnan
    Zhou, Yu
    Zhang, Jiajun
    Li, Haoran
    Zong, Chengqing
    Li, Changliang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9749 - 9756
  • [8] On Multimodal Microblog Summarization
    Saini, Naveen
    Saha, Sriparna
    Bhattacharyya, Pushpak
    Mrinal, Shubhankar
    Mishra, Santosh Kumar
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 9 (05) : 1317 - 1329
  • [9] Multimodal summarization of meeting recordings
    Erol, B
    Lee, DS
    Hull, J
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 25 - 28
  • [10] Multimodal news document summarization
    Javed, Hira
    Akhtar, Nadeem
    Beg, M. M. Sufyan
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2024, 45 (04): : 959 - 968