DAS-CL: Towards Multimodal Machine Translation via Dual-Level Asymmetric Contrastive Learning

被引:1
|
作者
Cheng, Xuxin [1 ]
Zhu, Zhihong [1 ]
Li, Yaowei [1 ]
Li, Hongxiang [1 ]
Zou, Yuexian [1 ]
机构
[1] Peking Univ, Sch ECE, Beijing, Peoples R China
关键词
Multimodal Machine Translation; Asymmetric Contrastive Learning; Image Captioning; Object Detection;
D O I
10.1145/3583780.3614832
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal machine translation (MMT) aims to exploit visual information to improve neural machine translation (NMT). It has been demonstrated that image captioning and object detection can further improve MMT. In this paper, to leverage image captioning and object detection more effectively, we propose a Dual-level ASymmetric Contrastive Learning (DAS-CL) framework. Specifically, we leverage image captioning and object detection to generate more pairs of visual inputs and textual inputs. At the utterance level, we introduce an image captioning model to generate more coarse-grained pairs. At the word level, we introduce an object detection model to generate more fine-grained pairs. To mitigate the negative impact of noise in generated pairs, we apply asymmetric contrastive learning at these two levels. Experiments on the Multi30K dataset of three translation directions demonstrate that DAS-CL significantly outperforms existing MMT frameworks and achieves new state-ofthe-art performance. More encouragingly, further analysis displays that DAS-CL is more robust to irrelevant visual information.
引用
收藏
页码:337 / 347
页数:11
相关论文
共 29 条
  • [1] Video-guided machine translation via dual-level back-translation
    Chen, Shiyu
    Zeng, Yawen
    Cao, Da
    Lu, Shaofei
    KNOWLEDGE-BASED SYSTEMS, 2022, 245
  • [2] Dual-Level Contrastive Learning for Improving Conciseness of Summarization
    Peng, Wei
    Zhang, Han
    Jiang, Dan
    Xiao, Kejing
    Li, Yuxuan
    IEEE ACCESS, 2024, 12 : 65630 - 65639
  • [3] CL-MVSNet: Unsupervised Multi-view Stereo with Dual-level Contrastive Learning
    Xiong, Kaiqiang
    Peng, Rui
    Zhang, Zhe
    Feng, Tianxing
    Jiao, Jianbo
    Gao, Feng
    Wang, Ronggang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3746 - 3757
  • [4] Dual-level contrastive learning network for generalized zero-shot learning
    Guan, Jiaqi
    Meng, Min
    Liang, Tianyou
    Liu, Jigang
    Wu, Jigang
    VISUAL COMPUTER, 2022, 38 (9-10): : 3087 - 3095
  • [5] Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation
    Junjie Ye
    Junjun Guo
    Applied Intelligence, 2022, 52 : 14194 - 14203
  • [6] Dual-level interactive multimodal-mixup encoder for multi-modal neural machine translation
    Ye, Junjie
    Guo, Junjun
    APPLIED INTELLIGENCE, 2022, 52 (12) : 14194 - 14203
  • [7] Dual-level contrastive learning network for generalized zero-shot learning
    Jiaqi Guan
    Min Meng
    Tianyou Liang
    Jigang Liu
    Jigang Wu
    The Visual Computer, 2022, 38 : 3087 - 3095
  • [8] Dual-level contrastive learning for unsupervised person re-identification
    Zhao, Yu
    Shu, Qiaoyuan
    Shi, Xi
    IMAGE AND VISION COMPUTING, 2023, 129
  • [9] Unsupervised Domain Adaptation on Person Reidentification Via Dual-Level Asymmetric Mutual Learning
    Wu, Qiong
    Li, Jiahan
    Dai, Pingyang
    Ye, Qixiang
    Cao, Liujuan
    Wu, Yongjian
    Ji, Rongrong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (01) : 1371 - 1382
  • [10] Unsupervised Domain Adaptation on Person Reidentification via Dual-Level Asymmetric Mutual Learning
    Wu, Qiong
    Li, Jiahan
    Dai, Pingyang
    Ye, Qixiang
    Cao, Liujuan
    Wu, Yongjian
    Ji, Rongrong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 12