UniVIP: A Unified Framework for Self-Supervised Visual Pre-training

被引:5
|
作者
Li, Zhaowen [1 ,2 ]
Zhu, Yousong [1 ]
Yang, Fan [3 ]
Li, Wei [3 ]
Zhao, Chaoyang [1 ,4 ]
Chen, Yingying [1 ]
Chen, Zhiyang [1 ,2 ]
Xie, Jiahao [5 ]
Wu, Liwei [3 ]
Zhao, Rui [3 ,7 ]
Tang, Ming [1 ]
Wang, Jinqiao [1 ,2 ,6 ]
机构
[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] SenseTime Res, Beijing, Peoples R China
[4] Dev Res Inst Guangzhou Smart City, Guangzhou, Peoples R China
[5] Nanyang Technol Univ, S Lab, Singapore, Singapore
[6] Peng Cheng Lab, Shenzhen, Peoples R China
[7] Shanghai Jiao Tong Univ, Qing Yuan Res Inst, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52688.2022.01422
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised learning (SSL) holds promise in leveraging large amounts of unlabeled data. However, the success of popular SSL methods has limited on single-centric-object images like those in ImageNet and ignores the correlation among the scene and instances, as well as the semantic difference of instances in the scene. To address the above problems, we propose a Unified Self-supervised Visual Pre-training (UniVIP), a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset. The framework takes into account the representation learning at three levels: 1) the similarity of scene-scene, 2) the correlation of scene-instance, 3) the discrimination of instance-instance. During the learning, we adopt the optimal transport algorithm to automatically measure the discrimination of instances. Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance on a variety of downstream tasks, such as image classification, semi-supervised learning, object detection and segmentation. Furthermore, our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing, and surpass current self-supervised object detection methods on COCO dataset, demonstrating its universality and potential.
引用
收藏
页码:14607 / 14616
页数:10
相关论文
共 50 条
  • [1] A Unified Visual Information Preservation Framework for Self-supervised Pre-Training in Medical Image Analysis
    Zhou, Hong-Yu
    Lu, Chixiang
    Chen, Chaoqi
    Yang, Sibei
    Yu, Yizhou
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8020 - 8035
  • [2] DenseCL: A simple framework for self-supervised dense visual pre-training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    [J]. VISUAL INFORMATICS, 2023, 7 (01) : 30 - 40
  • [3] Dense Contrastive Learning for Self-Supervised Visual Pre-Training
    Wang, Xinlong
    Zhang, Rufeng
    Shen, Chunhua
    Kong, Tao
    Li, Lei
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 3023 - 3032
  • [4] Masked Feature Prediction for Self-Supervised Visual Pre-Training
    Wei, Chen
    Fan, Haoqi
    Xie, Saining
    Wu, Chao-Yuan
    Yuille, Alan
    Feichtenhofer, Christoph
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 14648 - 14658
  • [5] Correlational Image Modeling for Self-Supervised Visual Pre-Training
    Li, Wei
    Xie, Jiahao
    Loy, Chen Change
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 15105 - 15115
  • [6] Self-supervised ECG pre-training
    Liu, Han
    Zhao, Zhenbo
    She, Qiang
    [J]. BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
  • [7] Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
    Li, Tianjiao
    Foo, Lin Geng
    Hu, Ping
    Shang, Xindi
    Rahmani, Hossein
    Yuan, Zehuan
    Liu, Jun
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24027 - 24038
  • [8] SslTransT: Self-supervised pre-training visual object tracking with Transformers
    Cai, Yannan
    Tan, Ke
    Wei, Zhenzhong
    [J]. OPTICS COMMUNICATIONS, 2024, 557
  • [9] Self-supervised Pre-training for Mirror Detection
    Lin, Jiaying
    Lau, Rynson W. H.
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12193 - 12202
  • [10] Self-supervised Pre-training for Nuclei Segmentation
    Haq, Mohammad Minhazul
    Huang, Junzhou
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT II, 2022, 13432 : 303 - 313