DAS-CL: Towards Multimodal Machine Translation via Dual-Level Asymmetric Contrastive Learning

被引:1
|
作者
Cheng, Xuxin [1 ]
Zhu, Zhihong [1 ]
Li, Yaowei [1 ]
Li, Hongxiang [1 ]
Zou, Yuexian [1 ]
机构
[1] Peking Univ, Sch ECE, Beijing, Peoples R China
关键词
Multimodal Machine Translation; Asymmetric Contrastive Learning; Image Captioning; Object Detection;
D O I
10.1145/3583780.3614832
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal machine translation (MMT) aims to exploit visual information to improve neural machine translation (NMT). It has been demonstrated that image captioning and object detection can further improve MMT. In this paper, to leverage image captioning and object detection more effectively, we propose a Dual-level ASymmetric Contrastive Learning (DAS-CL) framework. Specifically, we leverage image captioning and object detection to generate more pairs of visual inputs and textual inputs. At the utterance level, we introduce an image captioning model to generate more coarse-grained pairs. At the word level, we introduce an object detection model to generate more fine-grained pairs. To mitigate the negative impact of noise in generated pairs, we apply asymmetric contrastive learning at these two levels. Experiments on the Multi30K dataset of three translation directions demonstrate that DAS-CL significantly outperforms existing MMT frameworks and achieves new state-ofthe-art performance. More encouragingly, further analysis displays that DAS-CL is more robust to irrelevant visual information.
引用
收藏
页码:337 / 347
页数:11
相关论文
共 29 条
  • [11] Contrastive Learning Based Visual Representation Enhancement for Multimodal Machine Translation
    Wang, Shike
    Zhang, Wen
    Guo, Wenyu
    Yu, Dong
    Liu, Pengyuan
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [12] Question-response representation with dual-level contrastive learning for improving knowledge tracing
    Zhao, Yan
    Ma, Huifang
    Wang, Jing
    He, Xiangchun
    Chang, Liang
    INFORMATION SCIENCES, 2024, 658
  • [13] Towards multimodal sarcasm detection via label-aware graph contrastive learning with back-translation augmentation
    Wei, Yiwei
    Duan, Maomao
    Zhou, Hengyang
    Jia, Zhiyang
    Gao, Zengwei
    Wang, Longbiao
    KNOWLEDGE-BASED SYSTEMS, 2024, 300
  • [14] Unpaired Multimodal Neural Machine Translation via Reinforcement Learning
    Wang, Yijun
    Wei, Tianxin
    Liu, Qi
    Chen, Enhong
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 168 - 185
  • [15] DSDCLNet: Dual-stream encoder and dual-level contrastive learning network for supervised multivariate time series classification
    Liu, Min
    Sheng, Hui
    Zhang, Ningyi
    Zhao, Panpan
    Yi, Yugen
    Jiang, Yirui
    Dai, Jiangyan
    KNOWLEDGE-BASED SYSTEMS, 2024, 292
  • [16] Link Prediction via Ranking Metric Dual-Level Attention Network Learning
    Zhao, Zhou
    Gao, Ben
    Zheng, Vincent W.
    Cai, Deng
    He, Xiaofei
    Zhuang, Yueting
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3525 - 3531
  • [17] Video Question Answering via Hierarchical Dual-Level Attention Network Learning
    Zhao, Zhou
    Lin, Jinghao
    Jiang, Xinghua
    Cai, Deng
    He, Xiaofei
    Zhuang, Yueting
    PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1050 - 1058
  • [18] Language-Enhanced Dual-Level Contrastive Learning Network for Open-Set Hyperspectral Image Classification
    Qin, Boao
    Feng, Shou
    Zhao, Chunhui
    Li, Wei
    Tao, Ran
    Zhou, Jun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63
  • [19] Unsupervised Visual Representation Learning via Dual-Level Progressive Similar Instance Selection
    Fan, Hehe
    Liu, Ping
    Xu, Mingliang
    Yang, Yi
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 8851 - 8861
  • [20] Trident: Change Point Detection for Multivariate Time Series via Dual-Level Attention Learning
    Duan, Ziyi
    Du, Haizhou
    Zheng, Yang
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2021, 2021, 12672 : 799 - 810