Unsupervised Visual Representation Learning by Context Prediction

被引:1521
|
作者
Doersch, Carl [1 ,2 ]
Gupta, Abhinav [1 ]
Efros, Alexei A. [2 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
关键词
D O I
10.1109/ICCV.2015.167
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work explores the use of spatial context as a source of free and plentiful supervisory signal for training a rich visual representation. Given only a large, unlabeled image collection, we extract random pairs of patches from each image and train a convolutional neural net to predict the position of the second patch relative to the first. We argue that doing well on this task requires the model to learn to recognize objects and their parts. We demonstrate that the feature representation learned using this within-image context indeed captures visual similarity across images. For example, this representation allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset. Furthermore, we show that the learned ConvNet can be used in the R-CNN framework [19] and provides a significant boost over a randomly-initialized ConvNet, resulting in state-of-the-art performance among algorithms which use only Pascal provided training set annotations.
引用
收藏
页码:1422 / 1430
页数:9
相关论文
共 50 条
  • [1] Unsupervised Representation Learning for Visual Robotics Grasping
    Wang, Shaochen
    Zhou, Zhangli
    Wang, Hao
    Li, Zhijun
    Kan, Zhen
    2022 INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS AND MECHATRONICS (ICARM 2022), 2022, : 57 - 62
  • [2] Jigsaw Clustering for Unsupervised Visual Representation Learning
    Chen, Pengguang
    Liu, Shu
    Jia, Jiaya
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11521 - 11530
  • [3] Model Selection for Unsupervised Learning of Visual Context
    Tao Xiang
    Shaogang Gong
    International Journal of Computer Vision, 2006, 69 : 181 - 201
  • [4] Model selection for unsupervised learning of visual context
    Xiang, Tao
    Gong, Shaogang
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2006, 69 (02) : 181 - 201
  • [5] Unsupervised Learning of Spoken Language with Visual Context
    Harwath, David
    Torralba, Antonio
    Glass, James R.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [6] Temporal Knowledge Consistency for Unsupervised Visual Representation Learning
    Feng, Weixin
    Wang, Yuanjiang
    Ma, Lihua
    Yuan, Ye
    Zhang, Chi
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10150 - 10160
  • [7] Hallucination Improves the Performance of Unsupervised Visual Representation Learning
    Wu, Jing
    Hobbs, Jennifer
    Hovakimyan, Naira
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16086 - 16097
  • [8] Unsupervised Visual Representation Learning by Synchronous Momentum Grouping
    Pang, Bo
    Zhang, Yifan
    Li, Yaoyi
    Cai, Jia
    Lu, Cewu
    COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 265 - 282
  • [9] Unsupervised Visual Representation Learning by Tracking Patches in Video
    Wang, Guangting
    Zhou, Yizhou
    Luo, Chong
    Xie, Wenxuan
    Zeng, Wenjun
    Xiong, Zhiwei
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2563 - 2572
  • [10] Fixed-Rank Representation for Unsupervised Visual Learning
    Liu, Risheng
    Lin, Zhouchen
    De la Torre, Fernando
    Su, Zhixun
    2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, : 598 - 605