A multimodal embedding transfer approach for consistent and selective learning processes in cross-modal retrieval

被引：0

作者：

Zeng, Zhixiong ^{[1
,2
]}

He, Shuyi ^{[1
,3
]}

Zhang, Yuhao ^{[4
]}

Mao, Wenji ^{[1
,3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

[2] Meituan, Beijing, Peoples R China

[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[4] Tencent, Shenzhen, Peoples R China

来源：

INFORMATION SCIENCES | 2025年 / 704卷

基金：

中国国家自然科学基金;

关键词：

Cross-modal retrieval; Multimodal embedding transfer; Selective optimization; Soft contrastive loss;

D O I：

10.1016/j.ins.2025.121974

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Cross-modal retrieval (CMR) aims to retrieve semantically relevant samples based on the query of a different modality. Previous work usually employs pair-wise and class-wise learning to learn a shared embedding space, so that modality invariance and semantic discrimination can be preserved. However, pair-wise and class-wise learning are conventionally considered separately in previous methods, which often brings about the inconsistent combination of learning objectives and unselective optimization of multimodal pairs, leading to insufficient/ineffective information utilization that degrades model performance. To tackle these issues, in this paper, we propose a novel multimodal embedding transfer approach to enable consistent and selective learning processes for CMR. To support consistent combination and maximize information utilization, our proposed framework leverages a source embedding model generated by class- wise learning and a target embedding model generated by pair-wise learning. We then develop the embedding transfer strategy to transfer multimodal embeddings from the source model to the target model, which provides the relaxed margins and relaxed labels simultaneously for the selective optimization of multimodal pairs. We finally design a soft contrastive loss to realize the multimodal embedding transfer strategy. Extensive experiments on the benchmark multimodal datasets verify the effectiveness of our approach for cross-modal retrieval.

引用

页数：14

共 50 条

[1] Deep Multimodal Transfer Learning for Cross-Modal Retrieval
Zhen, Liangli
Hu, Peng
Peng, Xi
Goh, Rick Siow Mong
Zhou, Joey Tianyi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) : 798 - 810
[2] Multimodal Graph Learning for Cross-Modal Retrieval
Xie, Jingyou
Zhao, Zishuo
Lin, Zhenzhou
Shen, Ying
PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
[3] Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval
Mithun, Niluthpol Chowdhury
Li, Juncheng
Metze, Florian
Roy-Chowdhury, Amit K.
ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 19 - 27
[4] Graph Embedding Learning for Cross-Modal Information Retrieval
Zhang, Youcai
Gu, Xiaodong
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 594 - 601
[5] Scalable Deep Multimodal Learning for Cross-Modal Retrieval
Hu, Peng
Zhen, Liangli
Peng, Dezhong
Liu, Pei
PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 635 - 644
[6] Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval
Kang, Cuicui
Xiang, Shiming
Liao, Shengcai
Xu, Changsheng
Pan, Chunhong
IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (03) : 370 - 381
[7] Cross-Modal Retrieval using Random Multimodal Deep Learning
Somasekar, Hemanth
Naveen, Kavya
JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (02): : 185 - 200
[8] Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval
Wang, Di
Gao, Xinbo
Wang, Xiumei
He, Lihuo
Yuan, Bo
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (10) : 4540 - 4554
[9] Deep Relation Embedding for Cross-Modal Retrieval
Zhang, Yifan
Zhou, Wengang
Wang, Min
Tian, Qi
Li, Houqiang
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627
[10] Cross-Modal Retrieval with Heterogeneous Graph Embedding
Chen, Dapeng
Wang, Min
Chen, Haobin
Wu, Lin
Qin, Jing
Peng, Wei
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3291 - 3300

← 1 2 3 4 5 →