RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation

被引:0
|
作者
Wang, Yan [1 ]
Zeng, Yawen [2 ]
Liang, Junjie [1 ]
Xing, Xiaofen [1 ]
Xu, Jin [1 ]
Xu, Xiangmin [1 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] ByteDance AI Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-modal machine translation; multi-modal prompt learning; multi-modal dictionary;
D O I
10.1145/3652583.3658018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As an extension of machine translation, the primary objective of multi-modal machine translation is to optimize the utilization of visual information. Technically, image information is integrated into multi-modal fusion and alignment as an auxiliary modality through concepts or latent semantics, which are typically based on the Transformer framework. However, current approaches often ignore one modality to design numerous handcrafted features (e.g. visual concept extraction) and require training of all parameters in their framework. Therefore, it is worthwhile to explore multi-modal concepts or features to enhance performance and an efficient approach to incorporate visual information with minimal cost. Meanwhile, with the development of multi-modal large language models (MLLMs), they are faced with the visual hallucination issue of compromising performance, despite their powerful capabilities. Inspired by pioneering techniques in the multi-modal field, such as prompt learning and MLLMs, this paper innovatively explores the possibility of applying multi-modal prompt learning to this multi-modal machine translation task. Our framework offers three key advantages: it establishes a robust connection between visual concepts and translation processes, requires a minimum of 1.46M parameters for training, and can be seamlessly integrated into any existing framework by retrieving a multi-modal dictionary. Specifically, we propose two prompt-guided strategies: a learnable prompt-refined module and a heuristic prompt-refined module. Among them, the learnable strategy utilizes off-the-shelf pre-trained models, while the heuristic strategy constrains the hallucination problem via concept retrieval. Our experiments on two real-world benchmark datasets demonstrate that our proposed method outperforms all competitors.
引用
收藏
页码:860 / 868
页数:9
相关论文
共 50 条
  • [1] MaPLe: Multi-modal Prompt Learning
    Khattak, Muhammad Uzair
    Rasheed, Hanoona
    Maaz, Muhammad
    Khan, Salman
    Khan, Fahad Shahbaz
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19113 - 19122
  • [2] Constrained Bipartite Graph Learning for Imbalanced Multi-Modal Retrieval
    Zhang, Han
    Li, Yiding
    Li, Xuelong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4502 - 4514
  • [3] Flexible Dual Multi-Modal Hashing for Incomplete Multi-Modal Retrieval
    Wei, Yuhong
    An, Junfeng
    INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2024,
  • [4] Multi-modal anchor adaptation learning for multi-modal summarization
    Chen, Zhongfeng
    Lu, Zhenyu
    Rong, Huan
    Zhao, Chuanjun
    Xu, Fan
    NEUROCOMPUTING, 2024, 570
  • [5] Unsupervised Multi-modal Neural Machine Translation
    Su, Yuanhang
    Fan, Kai
    Nguyen Bach
    Kuo, C-C Jay
    Huang, Fei
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10474 - 10483
  • [6] Hadamard matrix-guided multi-modal hashing for multi-modal retrieval
    Yu, Jun
    Huang, Wei
    Li, Zuhe
    Shu, Zhenqiu
    Zhu, Liang
    DIGITAL SIGNAL PROCESSING, 2022, 130
  • [7] Multi-modal long document classification based on Hierarchical Prompt and Multi-modal Transformer
    Liu, Tengfei
    Hu, Yongli
    Gao, Junbin
    Wang, Jiapu
    Sun, Yanfeng
    Yin, Baocai
    NEURAL NETWORKS, 2024, 176
  • [8] Visual Prompt Multi-Modal Tracking
    Zhu, Jiawen
    Lai, Simiao
    Chen, Xin
    Wang, Dong
    Lu, Huchuan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9516 - 9526
  • [9] LCEMH: Label Correlation Enhanced Multi-modal Hashing for efficient multi-modal retrieval
    Zheng, Chaoqun
    Zhu, Lei
    Zhang, Zheng
    Duan, Wenjun
    Lu, Wenpeng
    INFORMATION SCIENCES, 2024, 659
  • [10] Learning to decode to future success for multi-modal neural machine translation
    Huang, Yan
    Zhang, TianYuan
    Xu, Chun
    JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):