DAS-CL: Towards Multimodal Machine Translation via Dual-Level Asymmetric Contrastive Learning

被引:1
|
作者
Cheng, Xuxin [1 ]
Zhu, Zhihong [1 ]
Li, Yaowei [1 ]
Li, Hongxiang [1 ]
Zou, Yuexian [1 ]
机构
[1] Peking Univ, Sch ECE, Beijing, Peoples R China
关键词
Multimodal Machine Translation; Asymmetric Contrastive Learning; Image Captioning; Object Detection;
D O I
10.1145/3583780.3614832
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal machine translation (MMT) aims to exploit visual information to improve neural machine translation (NMT). It has been demonstrated that image captioning and object detection can further improve MMT. In this paper, to leverage image captioning and object detection more effectively, we propose a Dual-level ASymmetric Contrastive Learning (DAS-CL) framework. Specifically, we leverage image captioning and object detection to generate more pairs of visual inputs and textual inputs. At the utterance level, we introduce an image captioning model to generate more coarse-grained pairs. At the word level, we introduce an object detection model to generate more fine-grained pairs. To mitigate the negative impact of noise in generated pairs, we apply asymmetric contrastive learning at these two levels. Experiments on the Multi30K dataset of three translation directions demonstrate that DAS-CL significantly outperforms existing MMT frameworks and achieves new state-ofthe-art performance. More encouragingly, further analysis displays that DAS-CL is more robust to irrelevant visual information.
引用
收藏
页码:337 / 347
页数:11
相关论文
共 29 条
  • [21] Towards Continual Learning for Multilingual Machine Translation via Vocabulary Substitution
    Garcia, Xavier
    Constant, Noah
    Parikh, Ankur P.
    Firat, Orhan
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1184 - 1192
  • [22] Dual-level Deep Evidential Fusion: Integrating multimodal information for enhanced reliable decision-making in deep learning
    Shao, Zhimin
    Dou, Weibei
    Pan, Yu
    INFORMATION FUSION, 2024, 103
  • [23] Enhancement of lower limb motor imagery ability via dual-level multimodal stimulation and sparse spatial pattern decoding method
    Hou, Yao
    Gu, Zhenghui
    Yu, Zhu Liang
    Xie, Xiaofeng
    Tang, Rongnian
    Xu, Jinghan
    Qi, Feifei
    FRONTIERS IN HUMAN NEUROSCIENCE, 2022, 16
  • [24] Towards Spoken Language Understanding via Multi-level Multi-grained Contrastive Learning
    Cheng, Xuxin
    Xu, Wanshi
    Zhu, Zhihong
    Li, Hongxiang
    Zou, Yuexian
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 326 - 336
  • [25] Towards semantically continuous unpaired image-to-image translation via margin adaptive contrastive learning and wavelet transform
    Zhang, Heng
    Yang, Yi-Jun
    Zeng, Wei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252
  • [26] Dynamic Context Selection for Document-level Neural Machine Translation via Reinforcement Learning
    Kang, Xiaomian
    Zhao, Yang
    Zhang, Jiajun
    Zong, Chengqing
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2242 - 2254
  • [27] Continual Learning for Multilingual Neural Machine Translation via Dual Importance-based Model Division
    Liu, Junpeng
    Huang, Kaiyu
    Yu, Hao
    Li, Jiuyi
    Su, Jinsong
    Huang, Degen
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12011 - 12027
  • [28] Towards Optimizing Multi-Level Selective Maintenance via Machine Learning Predictive Models
    Achour, Amal
    Kammoun, Mohamed Ali
    Hajej, Zied
    APPLIED SCIENCES-BASEL, 2024, 14 (01):
  • [29] Cross-Domain Compound Fault Diagnosis of Machine-Level Motors via Time-Frequency Self-Contrastive Learning
    He, Yiming
    Zhao, Chao
    Shen, Weiming
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (07) : 9692 - 9701