ICAF: Iterative Contrastive Alignment Framework for Multimodal Abstractive Summarization

被引:1
|
作者
Zhang, Zijian [1 ]
Shu, Chang [2 ,3 ]
Chen, Youxin [2 ]
Xiao, Jing [2 ]
Zhang, Qian [3 ]
Zheng, Lu [3 ]
机构
[1] Meituan Dianping Grp, Shanghai, Peoples R China
[2] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
[3] Univ Nottingham Ningbo China, Ningbo, Peoples R China
关键词
multimodal abstractive summarization; recurrent alignment; contrastive learning;
D O I
10.1109/IJCNN55064.2022.9892884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Integrating multimodal knowledge for abstractive summarization task is a work-in-progress research area, with present techniques inheriting fusion-then-generation paradigm. Due to semantic gaps between computer vision and natural language processing, current methods often treat multiple data points as separate objects and rely on attention mechanisms to search for connection in order to fuse together. In addition, missing awareness of cross-modal matching from many frameworks leads to performance reduction. To solve these two drawbacks, we propose an Iterative Contrastive Alignment Framework (ICAF) that uses recurrent alignment and contrast to capture the coherences between images and texts. Specifically, we design a recurrent alignment (RA) layer to gradually investigate fine-grained semantical relationships between image patches and text tokens. At each step during the encoding process, cross-modal contrastive losses are applied to directly optimize the embedding space. According to ROUGE, relevance scores, and human evaluation, our model outperforms the state-of-the-art baselines on MSMO dataset. Experiments on the applicability of our proposed framework and hyperparameters settings have been also conducted.
引用
下载
收藏
页数:8
相关论文
共 50 条
  • [21] A Novel Framework for Semantic Oriented Abstractive Text Summarization
    Moratanch, N.
    Chitrakala, S.
    JOURNAL OF WEB ENGINEERING, 2018, 17 (08): : 675 - 716
  • [22] Gtpsum: guided tensor product framework for abstractive summarization
    Jingan Lu
    Zhenfang Zhu
    Kefeng Li
    Shuai Gong
    Hongli Pei
    Wenling Wang
    The Journal of Supercomputing, 2024, 80 : 4972 - 4995
  • [23] An Extractive-and-Abstractive Framework for Source Code Summarization
    Sun, Weisong
    Fang, Chunrong
    Chen, Yuchen
    Zhang, Quanjun
    Tao, Guanhong
    You, Yudu
    Han, Tingxu
    Ge, Yifei
    Hu, Yuling
    Luo, Bin
    Chen, Zhenyu
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2024, 33 (03)
  • [24] A novel abstractive summarization model based on topic-aware and contrastive learning
    Tang, Huanling
    Li, Ruiquan
    Duan, Wenhao
    Dou, Quansheng
    Lu, Mingyu
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, : 5563 - 5577
  • [25] Align and Attend: Multimodal Summarization with Dual Contrastive Losses
    He, Bo
    Wang, Jun
    Qiu, Jielin
    Bui, Trung
    Shrivastava, Abhinav
    Wang, Zhaowen
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14867 - 14878
  • [26] A Multi-Task Learning Framework for Abstractive Text Summarization
    Lu, Yao
    Liu, Linqing
    Jiang, Zhile
    Yang, Min
    Goebel, Randy
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9987 - 9988
  • [27] Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization
    Zheng, Chujie
    Zhang, Kunpeng
    Wang, Harry Jiannan
    Fan, Ling
    Wang, Zhe
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 1764 - 1771
  • [28] Cl2sum: abstractive summarization via contrastive prompt constructed by LLMs hallucination
    Xiang Huang
    Qiong Nong
    Xiaobo Wang
    Hongcheng Zhang
    Kunpeng Du
    Chunlin Yin
    Li Yang
    Bin Yan
    Xuan Zhang
    Complex & Intelligent Systems, 2025, 11 (3)
  • [29] Multimodal summarization with modality features alignment and features filtering
    Tang, Binghao
    Lin, Boda
    Chang, Zheng
    Li, Si
    NEUROCOMPUTING, 2024, 603
  • [30] A Context-Aware BERT Retrieval Framework Utilizing Abstractive Summarization
    Pan, Min
    Li, Teng
    Yang, Chenghao
    Zhou, Shuting
    Feng, Shaoxiong
    Fang, Youbin
    Li, Xingyu
    2022 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2022, : 873 - 878