A multimodal embedding transfer approach for consistent and selective learning processes in cross-modal retrieval

被引:0
|
作者
Zeng, Zhixiong [1 ,2 ]
He, Shuyi [1 ,3 ]
Zhang, Yuhao [4 ]
Mao, Wenji [1 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Meituan, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[4] Tencent, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Multimodal embedding transfer; Selective optimization; Soft contrastive loss;
D O I
10.1016/j.ins.2025.121974
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval (CMR) aims to retrieve semantically relevant samples based on the query of a different modality. Previous work usually employs pair-wise and class-wise learning to learn a shared embedding space, so that modality invariance and semantic discrimination can be preserved. However, pair-wise and class-wise learning are conventionally considered separately in previous methods, which often brings about the inconsistent combination of learning objectives and unselective optimization of multimodal pairs, leading to insufficient/ineffective information utilization that degrades model performance. To tackle these issues, in this paper, we propose a novel multimodal embedding transfer approach to enable consistent and selective learning processes for CMR. To support consistent combination and maximize information utilization, our proposed framework leverages a source embedding model generated by class- wise learning and a target embedding model generated by pair-wise learning. We then develop the embedding transfer strategy to transfer multimodal embeddings from the source model to the target model, which provides the relaxed margins and relaxed labels simultaneously for the selective optimization of multimodal pairs. We finally design a soft contrastive loss to realize the multimodal embedding transfer strategy. Extensive experiments on the benchmark multimodal datasets verify the effectiveness of our approach for cross-modal retrieval.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Deep supervised multimodal semantic autoencoder for cross-modal retrieval
    Tian, Yu
    Yang, Wenjing
    Liu, Qingsong
    Yang, Qiong
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
  • [42] Multimodal Multiclass Boosting and its Application to Cross-modal Retrieval
    Wang, Shixun
    Dou, Zhi
    Chen, Deng
    Yu, Hairong
    Li, Yuan
    Pan, Peng
    NEUROCOMPUTING, 2019, 357 : 11 - 23
  • [43] Multimodal Encoders for Food-Oriented Cross-Modal Retrieval
    Chen, Ying
    Zhou, Dong
    Li, Lin
    Han, Jun-mei
    WEB AND BIG DATA, APWEB-WAIM 2021, PT II, 2021, 12859 : 253 - 266
  • [44] Learning Cross-Modal Aligned Representation With Graph Embedding
    Zhang, Youcai
    Cao, Jiayan
    Gu, Xiaodong
    IEEE ACCESS, 2018, 6 : 77321 - 77333
  • [45] Label consistent locally linear embedding based cross-modal hashing
    Zeng, Hui
    Zhang, Huaxiang
    Zhu, Lei
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
  • [46] Incomplete Cross-Modal Retrieval with Deep Correlation Transfer
    Shi, Dan
    Zhu, Lei
    Li, Jingjing
    Dong, Guohua
    Zhang, Huaxiang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (05)
  • [47] Learning TFIDF Enhanced Joint Embedding for Recipe-Image Cross-Modal Retrieval Service
    Xie, Zhongwei
    Liu, Ling
    Wu, Yanzhao
    Li, Lin
    Zhong, Luo
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (06) : 3304 - 3316
  • [48] Cross-modal contrastive learning for multimodal sentiment recognition
    Yang, Shanliang
    Cui, Lichao
    Wang, Lei
    Wang, Tao
    APPLIED INTELLIGENCE, 2024, 54 (05) : 4260 - 4276
  • [49] Enhanced Multimodal Representation Learning with Cross-modal KD
    Chen, Mengxi
    Xing, Linyu
    Wang, Yu
    Zhang, Ya
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 11766 - 11775
  • [50] Cross-modal contrastive learning for multimodal sentiment recognition
    Shanliang Yang
    Lichao Cui
    Lei Wang
    Tao Wang
    Applied Intelligence, 2024, 54 : 4260 - 4276