RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation

被引:0
|
作者
Wang, Yan [1 ]
Zeng, Yawen [2 ]
Liang, Junjie [1 ]
Xing, Xiaofen [1 ]
Xu, Jin [1 ]
Xu, Xiangmin [1 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] ByteDance AI Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-modal machine translation; multi-modal prompt learning; multi-modal dictionary;
D O I
10.1145/3652583.3658018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As an extension of machine translation, the primary objective of multi-modal machine translation is to optimize the utilization of visual information. Technically, image information is integrated into multi-modal fusion and alignment as an auxiliary modality through concepts or latent semantics, which are typically based on the Transformer framework. However, current approaches often ignore one modality to design numerous handcrafted features (e.g. visual concept extraction) and require training of all parameters in their framework. Therefore, it is worthwhile to explore multi-modal concepts or features to enhance performance and an efficient approach to incorporate visual information with minimal cost. Meanwhile, with the development of multi-modal large language models (MLLMs), they are faced with the visual hallucination issue of compromising performance, despite their powerful capabilities. Inspired by pioneering techniques in the multi-modal field, such as prompt learning and MLLMs, this paper innovatively explores the possibility of applying multi-modal prompt learning to this multi-modal machine translation task. Our framework offers three key advantages: it establishes a robust connection between visual concepts and translation processes, requires a minimum of 1.46M parameters for training, and can be seamlessly integrated into any existing framework by retrieving a multi-modal dictionary. Specifically, we propose two prompt-guided strategies: a learnable prompt-refined module and a heuristic prompt-refined module. Among them, the learnable strategy utilizes off-the-shelf pre-trained models, while the heuristic strategy constrains the hallucination problem via concept retrieval. Our experiments on two real-world benchmark datasets demonstrate that our proposed method outperforms all competitors.
引用
收藏
页码:860 / 868
页数:9
相关论文
共 50 条
  • [41] Reliable Multi-modal Learning: A Survey
    Yang Y.
    Zhan D.-C.
    Jiang Y.
    Xiong H.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (04): : 1067 - 1081
  • [42] Multi-Modal Meta Continual Learning
    Gai, Sibo
    Chen, Zhengyu
    Wang, Donglin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [43] Learning of Multi-Modal Stimuli in Hawkmoths
    Balkenius, Anna
    Dacke, Marie
    PLOS ONE, 2013, 8 (07):
  • [44] Multi-modal Correlation Modeling and Ranking for Retrieval
    Zhang, Hong
    Meng, Fanlian
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2009, 2009, 5879 : 637 - 646
  • [45] MULTI-MODAL LEARNING FOR GESTURE RECOGNITION
    Cao, Congqi
    Zhang, Yifan
    Lu, Hanqing
    2015 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2015,
  • [46] Learning multi-modal control programs
    Mehta, TR
    Egerstedt, M
    HYBRID SYSTEMS: COMPUTATION AND CONTROL, 2005, 3414 : 466 - 479
  • [47] Imagery in multi-modal object learning
    Jüttner, M
    Rentschler, I
    BEHAVIORAL AND BRAIN SCIENCES, 2002, 25 (02) : 197 - +
  • [48] Multi-modal Subspace Learning with Dropout regularization for Cross-modal Recognition and Retrieval
    Cao, Guanqun
    Waris, Muhammad Adeel
    Iosifidis, Alexandros
    Gabbouj, Moncef
    2016 SIXTH INTERNATIONAL CONFERENCE ON IMAGE PROCESSING THEORY, TOOLS AND APPLICATIONS (IPTA), 2016,
  • [49] Multi-modal Subspace Learning with Joint Graph Regularization for Cross-modal Retrieval
    Wang, Kaiye
    Wang, Wei
    He, Ran
    Wang, Liang
    Tan, Tieniu
    2013 SECOND IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR 2013), 2013, : 236 - 240
  • [50] Multi-Modal and Multi-Domain Embedding Learning for Fashion Retrieval and Analysis
    Gu, Xiaoling
    Wong, Yongkang
    Shou, Lidan
    Peng, Pai
    Chen, Gang
    Kankanhalli, Mohan S.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (06) : 1524 - 1537