Self-Supervised Visual Representations for Cross-Modal Retrieval

被引:7
|
作者
Patel, Yash [1 ]
Gomez, Lluis [2 ]
Rusinol, Marcal [2 ]
Karatzas, Dimosthenis [2 ]
Jawahar, C., V [3 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[2] Univ Autonoma Barcelona, Comp Vis Ctr, Barcelona, Spain
[3] IIIT Hyderabad, CVIT, KCIS, Hyderabad, India
关键词
Self-Supervised Learning; Visual Representations; Cross-Modal Retrieval;
D O I
10.1145/3323873.3325035
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places. However, collecting and annotating such datasets requires a tremendous amount of human effort and, besides, their annotations are limited to discrete sets of popular visual classes that may not be representative of the richer semantics found on large-scale cross-modal retrieval datasets. In this paper, we present a self-supervised cross-modal retrieval framework that leverages as training data the correlations between images and text on the entire set of Wikipedia articles. Our method consists in training a CNN to predict: (1) the semantic context of the article in which an image is more probable to appear as an illustration, and (2) the semantic context of its caption. Our experiments demonstrate that the proposed method is not only capable of learning discriminative visual representations for solving vision tasks like classification, but that the learned representations are better for cross-modal retrieval when compared to supervised pre-training of the network on the ImageNet dataset.
引用
收藏
页码:182 / 186
页数:5
相关论文
共 50 条
  • [41] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Fuhao Zou
    Xingqiang Bai
    Chaoyang Luan
    Kai Li
    Yunfei Wang
    Hefei Ling
    [J]. World Wide Web, 2019, 22 : 825 - 841
  • [42] Semi-supervised cross-modal learning for cross modal retrieval and image annotation
    Zou, Fuhao
    Bai, Xingqiang
    Luan, Chaoyang
    Li, Kai
    Wang, Yunfei
    Ling, Hefei
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2019, 22 (02): : 825 - 841
  • [43] Supervised Discrete Matrix Factorization Hashing For Cross-Modal Retrieval
    Wu, Fei
    Wu, Zhiyong
    Feng, Yujian
    Zhou, Jun
    Huang, He
    Li, Xinwei
    Dong, Xiwei
    Jing, Xiao Yuan
    [J]. PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 855 - 859
  • [44] Semi-Supervised Cross-Modal Retrieval With Label Prediction
    Mandal, Devraj
    Rao, Pramod
    Biswas, Soma
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (09) : 2345 - 2353
  • [45] Asymmetric Supervised Consistent and Specific Hashing for Cross-Modal Retrieval
    Meng, Min
    Wang, Haitao
    Yu, Jun
    Chen, Hui
    Wu, Jigang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 986 - 1000
  • [46] Deep supervised multimodal semantic autoencoder for cross-modal retrieval
    Tian, Yu
    Yang, Wenjing
    Liu, Qingsong
    Yang, Qiong
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
  • [47] Semi-Relaxation Supervised Hashing for Cross-Modal Retrieval
    Zhang, Peng-Fei
    Li, Chuan-Xiang
    Liu, Meng-Yuan
    Nie, Liqiang
    Xu, Xin-Shun
    [J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1762 - 1770
  • [48] Discriminative deep asymmetric supervised hashing for cross-modal retrieval
    Qiang, Haopeng
    Wan, Yuan
    Liu, Ziyi
    Xiang, Lun
    Meng, Xiaojing
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 204
  • [49] Discriminative deep asymmetric supervised hashing for cross-modal retrieval
    Qiang, Haopeng
    Wan, Yuan
    Liu, Ziyi
    Xiang, Lun
    Meng, Xiaojing
    [J]. Knowledge-Based Systems, 2022, 204
  • [50] Deep supervised fused similarity hashing for cross-modal retrieval
    Ng, Wing W. Y.
    Xu, Yongzhi
    Tian, Xing
    Wang, Hui
    [J]. Multimedia Tools and Applications, 2024, 83 (39) : 86537 - 86555