Self-Supervised Correlation Learning for Cross-Modal Retrieval

被引:24
|
作者
Liu, Yaxin [1 ]
Wu, Jianlong [1 ]
Qu, Leigang [1 ]
Gan, Tian [1 ]
Yin, Jianhua [1 ]
Nie, Liqiang [1 ]
机构
[1] Shandong Univ, Sch Comp Sci & Technol, Qingdao 266237, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; self-supervised contrastive learning; mutual information estimation;
D O I
10.1109/TMM.2022.3152086
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval aims to retrieve relevant data from another modality when given a query of one modality. Although most existing methods that rely on the label information of multimedia data have achieved promising results, the performance benefiting from labeled data comes at a high cost since labeling data often requires enormous labor resources, especially on large-scale multimedia datasets. Therefore, unsupervised cross-modal learning is of crucial importance in real-world applications. In this paper, we propose a novel unsupervised cross-modal retrieval method, named Self-supervised Correlation Learning (SCL), which takes full advantage of large amounts of unlabeled data to learn discriminative and modality-invariant representations. Since unsupervised learning lacks the supervision of category labels, we incorporate the knowledge from the input as a supervisory signal by maximizing the mutual information between the input and the output of different modality-specific projectors. Besides, for the purpose of learning discriminative representations, we exploit unsupervised contrastive learning to model the relationship among intra- and inter-modality instances, which makes similar samples closer and pushes dissimilar samples apart. Moreover, to further eliminate the modality gap, we use a weight-sharing scheme and minimize the modality-invariant loss in the joint representation space. Beyond that, we also extend the proposed method to the semi-supervised setting. Extensive experiments conducted on three widely-used benchmark datasets demonstrate that our method achieves competitive results compared with current state-of-the-art cross-modal retrieval approaches.
引用
收藏
页码:2851 / 2863
页数:13
相关论文
共 50 条
  • [41] Adaptively Unified Semi-supervised Learning for Cross-Modal Retrieval
    Zhang, Liang
    Ma, Bingpeng
    He, Jianfeng
    Li, Guorong
    Huang, Qingming
    Tian, Qi
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3406 - 3412
  • [42] Supervised Contrastive Learning for 3D Cross-Modal Retrieval
    Choo, Yeon-Seung
    Kim, Boeun
    Kim, Hyun-Sik
    Park, Yong-Suk
    Applied Sciences (Switzerland), 2024, 14 (22):
  • [43] Two-stage deep learning for supervised cross-modal retrieval
    Shao, Jie
    Zhao, Zhicheng
    Su, Fei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (12) : 16615 - 16631
  • [44] Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval
    Yuan, Xu
    Zhong, Hua
    Chen, Zhikui
    Zhong, Fangming
    Hu, Yueming
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) : 29 - 45
  • [45] Deep Semantic Correlation with Adversarial Learning for Cross-Modal Retrieval
    Hua, Yan
    Du, Jianhe
    PROCEEDINGS OF 2019 IEEE 9TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2019), 2019, : 252 - 255
  • [46] Self-supervised Exclusive Learning for 3D Segmentation with Cross-modal Unsupervised Domain Adaptation
    Zhang, Yachao
    Li, Miaoyu
    Xie, Yuan
    Li, Cuihua
    Wang, Cong
    Zhang, Zhizhong
    Qu, Yanyun
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3338 - 3346
  • [47] CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
    Afham, Mohamed
    Dissanayake, Isuru
    Dissanayake, Dinithi
    Dharmasiri, Amaya
    Thilakarathna, Kanchana
    Rodrigo, Ranga
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9892 - 9902
  • [48] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [49] Self-Supervised Multi-Modal Knowledge Graph Contrastive Hashing for Cross-Modal Search
    Liang, Meiyu
    Du, Junping
    Liang, Zhengyang
    Xing, Yongwang
    Huang, Wei
    Xue, Zhe
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13744 - 13753
  • [50] CMD: Self-supervised 3D Action Representation Learning with Cross-Modal Mutual Distillation
    Mao, Yunyao
    Zhou, Wengang
    Lu, Zhenbo
    Deng, Jiajun
    Li, Houqiang
    COMPUTER VISION - ECCV 2022, PT III, 2022, 13663 : 734 - 752