A multimodal embedding transfer approach for consistent and selective learning processes in cross-modal retrieval

被引:0
|
作者
Zeng, Zhixiong [1 ,2 ]
He, Shuyi [1 ,3 ]
Zhang, Yuhao [4 ]
Mao, Wenji [1 ,3 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Meituan, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[4] Tencent, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; Multimodal embedding transfer; Selective optimization; Soft contrastive loss;
D O I
10.1016/j.ins.2025.121974
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval (CMR) aims to retrieve semantically relevant samples based on the query of a different modality. Previous work usually employs pair-wise and class-wise learning to learn a shared embedding space, so that modality invariance and semantic discrimination can be preserved. However, pair-wise and class-wise learning are conventionally considered separately in previous methods, which often brings about the inconsistent combination of learning objectives and unselective optimization of multimodal pairs, leading to insufficient/ineffective information utilization that degrades model performance. To tackle these issues, in this paper, we propose a novel multimodal embedding transfer approach to enable consistent and selective learning processes for CMR. To support consistent combination and maximize information utilization, our proposed framework leverages a source embedding model generated by class- wise learning and a target embedding model generated by pair-wise learning. We then develop the embedding transfer strategy to transfer multimodal embeddings from the source model to the target model, which provides the relaxed margins and relaxed labels simultaneously for the selective optimization of multimodal pairs. We finally design a soft contrastive loss to realize the multimodal embedding transfer strategy. Extensive experiments on the benchmark multimodal datasets verify the effectiveness of our approach for cross-modal retrieval.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] Learning DALTS for cross-modal retrieval
    Yu, Zheng
    Wang, Wenmin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16
  • [22] Sequential Learning for Cross-modal Retrieval
    Song, Ge
    Tan, Xiaoyang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
  • [23] Label Embedding Online Hashing for Cross-Modal Retrieval
    Wang, Yongxin
    Luo, Xin
    Xu, Xin-Shun
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 871 - 879
  • [24] Cross-lingual Cross-modal Pretraining for Multimodal Retrieval
    Fei, Hongliang
    Yu, Tan
    Li, Ping
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3644 - 3650
  • [25] SCH: Symmetric Consistent Hashing for cross-modal retrieval
    Ni, Haomin
    Fang, Xiaozhao
    Kang, Peipei
    Gao, Hongbo
    Zhou, Guoxu
    Xie, Shengli
    SIGNAL PROCESSING, 2024, 215
  • [26] Deep multimodal learning for cross-modal retrieval: One model for all tasks
    Beltran, L. Viviana Beltran
    Caicedo, Juan C.
    Journet, Nicholas
    Coustaty, Mickael
    Lecellier, Francois
    Doucet, Antoine
    PATTERN RECOGNITION LETTERS, 2021, 146 : 38 - 45
  • [27] Multimodal Relation Extraction with Cross-Modal Retrieval and Synthesis
    Hu, Xuming
    Guo, Zhijiang
    Teng, Zhiyang
    King, Irwin
    Yu, Philip S.
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 303 - 311
  • [28] Cross-modal Metric Learning with Graph Embedding
    Zhang, Youcai
    Gu, Xiaodong
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 758 - 764
  • [29] Cross-modal Image-Graphics Retrieval by Neural Transfer Learning
    Junkert, Fabian
    Eberts, Markus
    Ulges, Adrian
    Schwanecke, Ulrich
    PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 335 - 342
  • [30] Cross-Modal Retrieval via Similarity-Preserving Learning and Semantic Average Embedding
    Zhi, Tao
    Fan, Yingchun
    Han, Hong
    IEEE ACCESS, 2020, 8 : 223918 - 223930