MimCo: Masked Image Modeling Pre-training with Contrastive Teacher

被引:5
|
作者
Zhou, Qiang [1 ]
Yu, Chaohui [1 ]
Luo, Hao [1 ]
Wang, Zhibin [1 ]
Li, Hao [1 ]
机构
[1] Alibaba Grp, Hangzhou, Peoples R China
关键词
self-supervised learning; pre-training; contrastive learning; mask image modeling;
D O I
10.1145/3503161.3548173
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recent masked image modeling (MIM) has received much attention in self-supervised learning (SSL), which requires the target model to recover the masked part of the input image. Although MIM-based pre-training methods achieve new state-of-the-art performance when transferred to many downstream tasks, the visualizations show that the learned representations are less separable, especially compared to those based on contrastive learning pre-training. This inspires us to think whether the linear separability of MIM pretrained representation can be further improved, thereby improving the pre-training performance. Since MIM and contrastive learning tend to utilize different data augmentations and training strategies, combining these two pretext tasks is not trivial. In this work, we propose a novel and flexible pre-training framework, named MimCo, which combines MIM and contrastive learning through two-stage pre-training. Specifically, MimCo takes a pre-trained contrastive learning model as the teacher model and is pre-trained with two types of learning targets: patch-level and image-level reconstruction losses. Extensive transfer experiments on downstream tasks demonstrate the superior performance of our MimCo pre-training framework. Taking ViT-S as an example, when using the pre-trained MoCov3-ViT-S as the teacher model, MimCo only needs 100 epochs of pre-training to achieve 82.53% top-1 finetuning accuracy on Imagenet-1K, which outperforms the state-of-the-art self-supervised learning counterparts.
引用
收藏
页码:4487 / 4495
页数:9
相关论文
共 50 条
  • [1] MRM: Masked Relation Modeling for Medical Image Pre-Training with Genetics
    Yang, Qiushi
    Li, Wuyang
    Li, Baopu
    Yuan, Yixuan
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21395 - 21405
  • [2] Image Difference Captioning with Pre-training and Contrastive Learning
    Yao, Linli
    Wang, Weiying
    Jin, Qin
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 3108 - 3116
  • [3] Masked Channel Modeling for Bootstrapping Visual Pre-training
    Liu, Yang
    Wang, Xinlong
    Zhu, Muzhi
    Cao, Yue
    Huang, Tiejun
    Shen, Chunhua
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024,
  • [4] MAC: Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
    Shu, Fangxun
    Chen, Biaolong
    Liao, Yue
    Wang, Jinqiao
    Liu, Si
    [J]. IEEE Transactions on Multimedia, 2024, 26 : 9962 - 9972
  • [5] Contrastive Language-Image Pre-Training with Knowledge Graphs
    Pan, Xuran
    Ye, Tianzhu
    Han, Dongchen
    Song, Shiji
    Huang, Gao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [6] On Masked Pre-training and the Marginal Likelihood
    Moreno-Munoz, Pablo
    Recasens, Pol G.
    Hauberg, Soren
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [7] UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
    Lee, Janghyeon
    Kim, Jongsuk
    Shon, Hyounguk
    Kim, Bumsoo
    Kim, Seung Hwan
    Lee, Honglak
    Kim, Junmo
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [8] Hybrid Pre-training Based on Masked Autoencoders for Medical Image Segmentation
    Han, Yufei
    Chen, Haoyuan
    Xu, Pin
    Li, Yanyi
    Li, Kuan
    Yin, Jianping
    [J]. THEORETICAL COMPUTER SCIENCE, NCTCS 2022, 2022, 1693 : 175 - 182
  • [9] SELF PRE-TRAINING WITH MASKED AUTOENCODERS FOR MEDICAL IMAGE CLASSIFICATION AND SEGMENTATION
    Zhou, Lei
    Liu, Huidong
    Bae, Joseph
    He, Junjun
    Samaras, Dimitris
    Prasanna, Prateek
    [J]. 2023 IEEE 20TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING, ISBI, 2023,
  • [10] Masked Modeling Duo: Towards a Universal Audio Pre-Training Framework
    Niizumi, Daisuke
    Takeuchi, Daiki
    Ohishi, Yasunori
    Harada, Noboru
    Kashino, Kunio
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 2391 - 2406