Unsupervised Pre-Training of Image Features on Non-Curated Data

被引:143
|
作者
Caron, Mathilde [1 ,2 ]
Bojanowski, Piotr [1 ]
Mairal, Julien [2 ]
Joulin, Armand [1 ]
机构
[1] Facebook AI Res, Menlo Pk, CA 94025 USA
[2] Univ Grenoble Alpes, CNRS, INRIA, Grenoble INP,LJK, F-38000 Grenoble, France
基金
欧洲研究理事会;
关键词
D O I
10.1109/ICCV.2019.00305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-training general-purpose visual features with convolutional neural networks without relying on annotations is a challenging and important task. Most recent efforts in unsupervised feature learning have focused on either small or highly curated datasets like ImageNet, whereas using non-curated raw datasets was found to decrease the feature quality when evaluated on a transfer task. Our goal is to bridge the performance gap between unsupervised methods trained on curated data, which are costly to obtain, and massive raw datasets that are easily available. To that effect, we propose a new unsupervised approach which leverages self-supervision and clustering to capture complementary statistics from large-scale data. We validate our approach on 96 million images from YFCC100M [42], achieving state-of-the-art results among unsupervised methods on standard benchmarks, which confirms the potential of unsupervised learning when only non-curated raw data are available. We also show that pre-training a supervised VGG-16 with our method achieves 74.9% top-1 classification accuracy on the validation set of ImageNet, which is an improvement of +0.8% over the same network trained from scratch. Our code is available at https://github.com/facebookresearch/DeeperCluster.
引用
收藏
页码:2959 / 2968
页数:10
相关论文
共 50 条
  • [1] Non-curated distributed databases for experimental data and models in neuroscience
    Cannon, RC
    Howell, FW
    Goddard, NH
    De Schutter, E
    [J]. NETWORK-COMPUTATION IN NEURAL SYSTEMS, 2002, 13 (03) : 415 - 428
  • [2] Unsupervised Pre-Training for Detection Transformers
    Dai, Zhigang
    Cai, Bolun
    Lin, Yugeng
    Chen, Junying
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 12772 - 12782
  • [3] Unsupervised Pre-Training for Voice Activation
    Kolesau, Aliaksei
    Sesok, Dmitrij
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (23): : 1 - 13
  • [4] Unsupervised Pre-training Across Image Domains Improves Lung Tissue Classification
    Schlegl, Thomas
    Ofner, Joachim
    Langs, Georg
    [J]. MEDICAL COMPUTER VISION: ALGORITHMS FOR BIG DATA, 2014, 8848 : 82 - 93
  • [5] Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data
    Grundkiewicz, Roman
    Junczys-Dowmunt, Marcin
    Heafield, Kenneth
    [J]. INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS, 2019, : 252 - 263
  • [6] Unsupervised Pre-training Classifier Based on Restricted Boltzmann Machine with Imbalanced Data
    Fu, Xiaoyang
    [J]. SMART COMPUTING AND COMMUNICATION, SMARTCOM 2016, 2017, 10135 : 102 - 110
  • [7] Neural speech enhancement with unsupervised pre-training and mixture training
    Hao, Xiang
    Xu, Chenglin
    Xie, Lei
    [J]. NEURAL NETWORKS, 2023, 158 : 216 - 227
  • [8] Unsupervised Pre-Training of Imbalanced Data for Identification of Wafer Map Defect Patterns
    Shon, Ho Sun
    Batbaatar, Erdenebileg
    Cho, Wan-Sup
    Choi, Seong Gon
    [J]. IEEE ACCESS, 2021, 9 : 52352 - 52363
  • [9] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data
    Manas, Oscar
    Lacoste, Alexandre
    Giro-i-Nieto, Xavier
    Vazquez, David
    Rodriguez, Pau
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9394 - 9403
  • [10] Image Representations Learned With Unsupervised Pre-Training Contain Human-like Biases
    Steed, Ryan
    Caliskan, Aylin
    [J]. PROCEEDINGS OF THE 2021 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2021, 2021, : 701 - 713