A multimodal embedding transfer approach for consistent and selective learning processes in cross-modal retrieval

被引:0
|
作者
Zeng, Zhixiong [1 ,2 ]
He, Shuyi [1 ,3 ]
Zhang, Yuhao [4 ]
Mao, Wenji [1 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Meituan, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[4] Tencent, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Multimodal embedding transfer; Selective optimization; Soft contrastive loss;
D O I
10.1016/j.ins.2025.121974
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval (CMR) aims to retrieve semantically relevant samples based on the query of a different modality. Previous work usually employs pair-wise and class-wise learning to learn a shared embedding space, so that modality invariance and semantic discrimination can be preserved. However, pair-wise and class-wise learning are conventionally considered separately in previous methods, which often brings about the inconsistent combination of learning objectives and unselective optimization of multimodal pairs, leading to insufficient/ineffective information utilization that degrades model performance. To tackle these issues, in this paper, we propose a novel multimodal embedding transfer approach to enable consistent and selective learning processes for CMR. To support consistent combination and maximize information utilization, our proposed framework leverages a source embedding model generated by class- wise learning and a target embedding model generated by pair-wise learning. We then develop the embedding transfer strategy to transfer multimodal embeddings from the source model to the target model, which provides the relaxed margins and relaxed labels simultaneously for the selective optimization of multimodal pairs. We finally design a soft contrastive loss to realize the multimodal embedding transfer strategy. Extensive experiments on the benchmark multimodal datasets verify the effectiveness of our approach for cross-modal retrieval.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Deep Multimodal Transfer Learning for Cross-Modal Retrieval
    Zhen, Liangli
    Hu, Peng
    Peng, Xi
    Goh, Rick Siow Mong
    Zhou, Joey Tianyi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (02) : 798 - 810
  • [2] Multimodal Graph Learning for Cross-Modal Retrieval
    Xie, Jingyou
    Zhao, Zishuo
    Lin, Zhenzhou
    Shen, Ying
    PROCEEDINGS OF THE 2023 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2023, : 145 - 153
  • [3] Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval
    Mithun, Niluthpol Chowdhury
    Li, Juncheng
    Metze, Florian
    Roy-Chowdhury, Amit K.
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 19 - 27
  • [4] Graph Embedding Learning for Cross-Modal Information Retrieval
    Zhang, Youcai
    Gu, Xiaodong
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 594 - 601
  • [5] Scalable Deep Multimodal Learning for Cross-Modal Retrieval
    Hu, Peng
    Zhen, Liangli
    Peng, Dezhong
    Liu, Pei
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 635 - 644
  • [6] Learning Consistent Feature Representation for Cross-Modal Multimedia Retrieval
    Kang, Cuicui
    Xiang, Shiming
    Liao, Shengcai
    Xu, Changsheng
    Pan, Chunhong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (03) : 370 - 381
  • [7] Cross-Modal Retrieval using Random Multimodal Deep Learning
    Somasekar, Hemanth
    Naveen, Kavya
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (02): : 185 - 200
  • [8] Multimodal Discriminative Binary Embedding for Large-Scale Cross-Modal Retrieval
    Wang, Di
    Gao, Xinbo
    Wang, Xiumei
    He, Lihuo
    Yuan, Bo
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (10) : 4540 - 4554
  • [9] Deep Relation Embedding for Cross-Modal Retrieval
    Zhang, Yifan
    Zhou, Wengang
    Wang, Min
    Tian, Qi
    Li, Houqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627
  • [10] Cross-Modal Retrieval with Heterogeneous Graph Embedding
    Chen, Dapeng
    Wang, Min
    Chen, Haobin
    Wu, Lin
    Qin, Jing
    Peng, Wei
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3291 - 3300