Self-Supervised Correlation Learning for Cross-Modal Retrieval

被引:24
|
作者
Liu, Yaxin [1 ]
Wu, Jianlong [1 ]
Qu, Leigang [1 ]
Gan, Tian [1 ]
Yin, Jianhua [1 ]
Nie, Liqiang [1 ]
机构
[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; self-supervised contrastive learning; mutual information estimation;
D O I
10.1109/TMM.2022.3152086
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval aims to retrieve relevant data from another modality when given a query of one modality. Although most existing methods that rely on the label information of multimedia data have achieved promising results, the performance benefiting from labeled data comes at a high cost since labeling data often requires enormous labor resources, especially on large-scale multimedia datasets. Therefore, unsupervised cross-modal learning is of crucial importance in real-world applications. In this paper, we propose a novel unsupervised cross-modal retrieval method, named Self-supervised Correlation Learning (SCL), which takes full advantage of large amounts of unlabeled data to learn discriminative and modality-invariant representations. Since unsupervised learning lacks the supervision of category labels, we incorporate the knowledge from the input as a supervisory signal by maximizing the mutual information between the input and the output of different modality-specific projectors. Besides, for the purpose of learning discriminative representations, we exploit unsupervised contrastive learning to model the relationship among intra- and inter-modality instances, which makes similar samples closer and pushes dissimilar samples apart. Moreover, to further eliminate the modality gap, we use a weight-sharing scheme and minimize the modality-invariant loss in the joint representation space. Beyond that, we also extend the proposed method to the semi-supervised setting. Extensive experiments conducted on three widely-used benchmark datasets demonstrate that our method achieves competitive results compared with current state-of-the-art cross-modal retrieval approaches.
引用
下载
收藏
页码:2851 / 2863
页数:13
相关论文
共 50 条
  • [1] Self-supervised incomplete cross-modal hashing retrieval
    Peng, Shouyong
    Yao, Tao
    Li, Ying
    Wang, Gang
    Wang, Lili
    Yan, Zhiming
    Expert Systems with Applications, 2025, 262
  • [2] Self-Supervised Visual Representations for Cross-Modal Retrieval
    Patel, Yash
    Gomez, Lluis
    Rusinol, Marcal
    Karatzas, Dimosthenis
    Jawahar, C., V
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 182 - 186
  • [3] Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning
    Salvador, Amaia
    Gundogdu, Erhan
    Bazzani, Loris
    Donoser, Michael
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15470 - 15479
  • [4] Perfect Match: Self-Supervised Embeddings for Cross-Modal Retrieval
    Chung, Soo-Whan
    Chung, Joon Son
    Kang, Hong-Goo
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2020, 14 (03) : 568 - 576
  • [5] Self-Supervised Adversarial Hashing Networks for Cross-Modal Retrieval
    Li, Chao
    Deng, Cheng
    Li, Ning
    Liu, Wei
    Gao, Xinbo
    Tao, Dacheng
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4242 - 4251
  • [6] Autoencoder-based self-supervised hashing for cross-modal retrieval
    Li, Yifan
    Wang, Xuan
    Cui, Lei
    Zhang, Jiajia
    Huang, Chengkai
    Luo, Xuan
    Qi, Shuhan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 17257 - 17274
  • [7] Autoencoder-based self-supervised hashing for cross-modal retrieval
    Yifan Li
    Xuan Wang
    Lei Cui
    Jiajia Zhang
    Chengkai Huang
    Xuan Luo
    Shuhan Qi
    Multimedia Tools and Applications, 2021, 80 : 17257 - 17274
  • [8] Self-supervised cross-modal visual retrieval from brain activities
    Ye, Zesheng
    Yao, Lina
    Zhang, Yu
    Gustin, Sylvia
    PATTERN RECOGNITION, 2024, 145
  • [9] SELF-SUPERVISED LEARNING WITH CROSS-MODAL TRANSFORMERS FOR EMOTION RECOGNITION
    Khare, Aparna
    Parthasarathy, Srinivas
    Sundaram, Shiva
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 381 - 388
  • [10] Self-supervised learning-based weight adaptive hashing for fast cross-modal retrieval
    Yifan Li
    Xuan Wang
    Shuhan Qi
    Chengkai Huang
    Zoe. L Jiang
    Qing Liao
    Jian Guan
    Jiajia Zhang
    Signal, Image and Video Processing, 2021, 15 : 673 - 680