Self-Supervised Correlation Learning for Cross-Modal Retrieval

被引:24
|
作者
Liu, Yaxin [1 ]
Wu, Jianlong [1 ]
Qu, Leigang [1 ]
Gan, Tian [1 ]
Yin, Jianhua [1 ]
Nie, Liqiang [1 ]
机构
[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; self-supervised contrastive learning; mutual information estimation;
D O I
10.1109/TMM.2022.3152086
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval aims to retrieve relevant data from another modality when given a query of one modality. Although most existing methods that rely on the label information of multimedia data have achieved promising results, the performance benefiting from labeled data comes at a high cost since labeling data often requires enormous labor resources, especially on large-scale multimedia datasets. Therefore, unsupervised cross-modal learning is of crucial importance in real-world applications. In this paper, we propose a novel unsupervised cross-modal retrieval method, named Self-supervised Correlation Learning (SCL), which takes full advantage of large amounts of unlabeled data to learn discriminative and modality-invariant representations. Since unsupervised learning lacks the supervision of category labels, we incorporate the knowledge from the input as a supervisory signal by maximizing the mutual information between the input and the output of different modality-specific projectors. Besides, for the purpose of learning discriminative representations, we exploit unsupervised contrastive learning to model the relationship among intra- and inter-modality instances, which makes similar samples closer and pushes dissimilar samples apart. Moreover, to further eliminate the modality gap, we use a weight-sharing scheme and minimize the modality-invariant loss in the joint representation space. Beyond that, we also extend the proposed method to the semi-supervised setting. Extensive experiments conducted on three widely-used benchmark datasets demonstrate that our method achieves competitive results compared with current state-of-the-art cross-modal retrieval approaches.
引用
收藏
页码:2851 / 2863
页数:13
相关论文
共 50 条
  • [31] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Fuhao Zou
    Xingqiang Bai
    Chaoyang Luan
    Kai Li
    Yunfei Wang
    Hefei Ling
    World Wide Web, 2019, 22 : 825 - 841
  • [32] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Zou, Fuhao
    Bai, Xingqiang
    Luan, Chaoyang
    Li, Kai
    Wang, Yunfei
    Ling, Hefei
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 825 - 841
  • [33] Cross-modal Self-Supervised Learning for Lip Reading: When Contrastive Learning meets Adversarial Training
    Sheng, Changchong
    Pietikainen, Matti
    Tian, Qi
    Liu, Li
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2456 - 2464
  • [34] Through-Wall Human Pose Reconstruction Based on Cross-Modal Learning and Self-Supervised Learning
    Zheng, Zhijie
    Zhang, Diankun
    Liang, Xiao
    Liu, Xiaojun
    Fang, Guangyou
    IEEE Geoscience and Remote Sensing Letters, 2022, 19
  • [35] Through-Wall Human Pose Reconstruction Based on Cross-Modal Learning and Self-Supervised Learning
    Zheng, Zhijie
    Zhang, Diankun
    Liang, Xiao
    Liu, Xiaojun
    Fang, Guangyou
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2022, 19
  • [36] Reinforced Cross-Modal Matching and Self-Supervised Imitation Learning for Vision-Language Navigation
    Wang, Xin
    Huang, Qiuyuan
    Celikyilmaz, Asli
    Gao, Jianfeng
    Shen, Dinghan
    Wang, Yuan-Fang
    Wang, William Yang
    Zhang, Lei
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3622 - 6631
  • [37] A semi-supervised cross-modal memory bank for cross-modal retrieval
    Huang, Yingying
    Hu, Bingliang
    Zhang, Yipeng
    Gao, Chi
    Wang, Quan
    NEUROCOMPUTING, 2024, 579
  • [38] Self-Supervised Cross-Modal Online Learning of Basic Object Affordances for Developmental Robotic Systems
    Ridge, Barry
    Skocaj, Danijel
    Leonardis, Ales
    2010 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2010, : 5047 - 5054
  • [39] CroSSL: Cross-modal Self-Supervised Learning for Time-series through Latent Masking
    Deldari, Shohreh
    Spathis, Dimitris
    Malekzadeh, Mohammad
    Kawsar, Fahim
    Salim, Flora D.
    Mathur, Akhil
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 152 - 160
  • [40] Two-stage deep learning for supervised cross-modal retrieval
    Jie Shao
    Zhicheng Zhao
    Fei Su
    Multimedia Tools and Applications, 2019, 78 : 16615 - 16631