RetrievalMMT: Retrieval-Constrained Multi-Modal Prompt Learning for Multi-Modal Machine Translation

被引：0

作者：

Wang, Yan ^{[1
]}

Zeng, Yawen ^{[2
]}

Liang, Junjie ^{[1
]}

Xing, Xiaofen ^{[1
]}

Xu, Jin ^{[1
]}

Xu, Xiangmin ^{[1
]}

机构：

[1] South China Univ Technol, Guangzhou, Peoples R China

[2] ByteDance AI Lab, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024 | 2024年

基金：

中国国家自然科学基金;

关键词：

multi-modal machine translation; multi-modal prompt learning; multi-modal dictionary;

D O I：

10.1145/3652583.3658018

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

As an extension of machine translation, the primary objective of multi-modal machine translation is to optimize the utilization of visual information. Technically, image information is integrated into multi-modal fusion and alignment as an auxiliary modality through concepts or latent semantics, which are typically based on the Transformer framework. However, current approaches often ignore one modality to design numerous handcrafted features (e.g. visual concept extraction) and require training of all parameters in their framework. Therefore, it is worthwhile to explore multi-modal concepts or features to enhance performance and an efficient approach to incorporate visual information with minimal cost. Meanwhile, with the development of multi-modal large language models (MLLMs), they are faced with the visual hallucination issue of compromising performance, despite their powerful capabilities. Inspired by pioneering techniques in the multi-modal field, such as prompt learning and MLLMs, this paper innovatively explores the possibility of applying multi-modal prompt learning to this multi-modal machine translation task. Our framework offers three key advantages: it establishes a robust connection between visual concepts and translation processes, requires a minimum of 1.46M parameters for training, and can be seamlessly integrated into any existing framework by retrieving a multi-modal dictionary. Specifically, we propose two prompt-guided strategies: a learnable prompt-refined module and a heuristic prompt-refined module. Among them, the learnable strategy utilizes off-the-shelf pre-trained models, while the heuristic strategy constrains the hallucination problem via concept retrieval. Our experiments on two real-world benchmark datasets demonstrate that our proposed method outperforms all competitors.

引用

页码：860 / 868

页数：9

共 50 条

[1] MaPLe: Multi-modal Prompt Learning
Khattak, Muhammad Uzair
Rasheed, Hanoona
Maaz, Muhammad
Khan, Salman
Khan, Fahad Shahbaz
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19113 - 19122
[2] Constrained Bipartite Graph Learning for Imbalanced Multi-Modal Retrieval
Zhang, Han
Li, Yiding
Li, Xuelong
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4502 - 4514
[3] Flexible Dual Multi-Modal Hashing for Incomplete Multi-Modal Retrieval
Wei, Yuhong
An, Junfeng
INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS, 2024,
[4] Multi-modal anchor adaptation learning for multi-modal summarization
Chen, Zhongfeng
Lu, Zhenyu
Rong, Huan
Zhao, Chuanjun
Xu, Fan
NEUROCOMPUTING, 2024, 570
[5] Unsupervised Multi-modal Neural Machine Translation
Su, Yuanhang
Fan, Kai
Nguyen Bach
Kuo, C-C Jay
Huang, Fei
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10474 - 10483
[6] Hadamard matrix-guided multi-modal hashing for multi-modal retrieval
Yu, Jun
Huang, Wei
Li, Zuhe
Shu, Zhenqiu
Zhu, Liang
DIGITAL SIGNAL PROCESSING, 2022, 130
[7] Multi-modal long document classification based on Hierarchical Prompt and Multi-modal Transformer
Liu, Tengfei
Hu, Yongli
Gao, Junbin
Wang, Jiapu
Sun, Yanfeng
Yin, Baocai
NEURAL NETWORKS, 2024, 176
[8] Visual Prompt Multi-Modal Tracking
Zhu, Jiawen
Lai, Simiao
Chen, Xin
Wang, Dong
Lu, Huchuan
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9516 - 9526
[9] LCEMH: Label Correlation Enhanced Multi-modal Hashing for efficient multi-modal retrieval
Zheng, Chaoqun
Zhu, Lei
Zhang, Zheng
Duan, Wenjun
Lu, Wenpeng
INFORMATION SCIENCES, 2024, 659
[10] Learning to decode to future success for multi-modal neural machine translation
Huang, Yan
Zhang, TianYuan
Xu, Chun
JOURNAL OF ENGINEERING RESEARCH, 2023, 11 (02):

← 1 2 3 4 5 →