ICAF: Iterative Contrastive Alignment Framework for Multimodal Abstractive Summarization

被引:1
|
作者
Zhang, Zijian [1 ]
Shu, Chang [2 ,3 ]
Chen, Youxin [2 ]
Xiao, Jing [2 ]
Zhang, Qian [3 ]
Zheng, Lu [3 ]
机构
[1] Meituan Dianping Grp, Shanghai, Peoples R China
[2] Ping An Technol Shenzhen Co Ltd, Shenzhen, Peoples R China
[3] Univ Nottingham Ningbo China, Ningbo, Peoples R China
关键词
multimodal abstractive summarization; recurrent alignment; contrastive learning;
D O I
10.1109/IJCNN55064.2022.9892884
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Integrating multimodal knowledge for abstractive summarization task is a work-in-progress research area, with present techniques inheriting fusion-then-generation paradigm. Due to semantic gaps between computer vision and natural language processing, current methods often treat multiple data points as separate objects and rely on attention mechanisms to search for connection in order to fuse together. In addition, missing awareness of cross-modal matching from many frameworks leads to performance reduction. To solve these two drawbacks, we propose an Iterative Contrastive Alignment Framework (ICAF) that uses recurrent alignment and contrast to capture the coherences between images and texts. Specifically, we design a recurrent alignment (RA) layer to gradually investigate fine-grained semantical relationships between image patches and text tokens. At each step during the encoding process, cross-modal contrastive losses are applied to directly optimize the embedding space. According to ROUGE, relevance scores, and human evaluation, our model outperforms the state-of-the-art baselines on MSMO dataset. Experiments on the applicability of our proposed framework and hyperparameters settings have been also conducted.
引用
下载
收藏
页数:8
相关论文
共 50 条
  • [41] A multimodal alignment framework for spoken documents
    Mekhaldi, Dalila
    Lalanne, Denis
    Ingold, Rolf
    MULTIMEDIA TOOLS AND APPLICATIONS, 2012, 61 (02) : 353 - 388
  • [42] A multimodal alignment framework for spoken documents
    Dalila Mekhaldi
    Denis Lalanne
    Rolf Ingold
    Multimedia Tools and Applications, 2012, 61 : 353 - 388
  • [43] UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation
    Zhang, Zhengkun
    Meng, Xiaojun
    Wang, Yasheng
    Jiang, Xin
    Liu, Qun
    Yang, Zhenglu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11757 - 11764
  • [44] SCT: Summary Caption Technique for Retrieving Relevant Images in Alignment with Multimodal Abstractive Summary
    Rafi, Shaik
    Das, Ranjita
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (03)
  • [45] An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT
    Ma C.
    Wu Z.
    Wang J.
    Xu S.
    Wei Y.
    Liu Z.
    Zeng F.
    Jiang X.
    Guo L.
    Cai X.
    Zhang S.
    Zhang T.
    Zhu D.
    Shen D.
    Liu T.
    Li X.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (08): : 1 - 12
  • [46] Knowledge-Infused Abstractive Summarization of Clinical Diagnostic Interviews: Framework Development Study
    Manas, Gaur
    Aribandi, Vamsi
    Kursuncu, Ugur
    Alambo, Amanuel
    Shalin, Valerie L.
    Thirunarayan, Krishnaprasad
    Beich, Jonathan
    Narasimhan, Meera
    Sheth, Amit
    JMIR MENTAL HEALTH, 2021, 8 (05):
  • [47] DCDSum: An interpretable extractive summarization framework based on contrastive learning method
    Zhang, Jiaqi
    Lu, Ling
    Zhang, Liang
    Chen, Yinong
    Liu, Wanping
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [48] Iterative Optimization-Enhanced Contrastive Learning for Multimodal Change Detection
    Tang, Yuqi
    Yang, Xin
    Han, Te
    Sun, Kai
    Guo, Yuqiang
    Hu, Jun
    Remote Sensing, 2024, 16 (19)
  • [49] A Knowledge Augmented and Multimodal-Based Framework for Video Summarization
    Xie, Jiehang
    Chen, Xuanbai
    Lu, Shao-Ping
    Yang, Yulu
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [50] Improvements in Multi-Document Abstractive Summarization using Multi Sentence Compression with Word Graph and Node Alignment
    Agarwal, Raksha
    Chatterjee, Niladri
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 190