The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models

被引：30

作者：

Chen, Tianlong ^{[1
]}

Frankle, Jonathan ^{[2
]}

Chang, Shiyu ^{[3
]}

Liu, Sijia ^{[3
,4
]}

Zhang, Yang ^{[3
]}

Carbin, Michael ^{[2
]}

Wang, Zhangyang ^{[1
]}

机构：

[1] Univ Texas Austin, Austin, TX 78712 USA

[2] MIT CSAIL, Cambridge, MA USA

[3] MIT IBM Watson AI Lab, Cambridge, MA USA

[4] Michigan State Univ, E Lansing, MI 48824 USA

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

美国国家科学基金会;

关键词：

D O I：

10.1109/CVPR46437.2021.01604

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The computer vision world has been re-gaining enthusiasm in various pre-trained models, including both classical ImageNet supervised pre-training and recently emerged self-supervised pre-training such as simCLR [10] and MoCo [40]. Pre-trained weights often boost a wide range of downstream tasks including classification, detection, and segmentation. Latest studies suggest that pre-training benefits from gigantic model capacity [11]. We are hereby curious and ask: after pre-training, does a pre-trained model indeed have to stay large for its downstream transferability? In this paper, we examine supervised and self-supervised pre-trained models through the lens of the lottery ticket hypothesis (LTH) [31]. LTH identifies highly sparse matching subnetworks that can be trained in isolation from (nearly) scratch yet still reach the full models' performance. We extend the scope of LTH and question whether matching subnetworks still exist in pre-trained computer vision models, that enjoy the same downstream transfer performance. Our extensive experiments convey an overall positive message: from all pre-trained weights obtained by ImageNet classification, simCLR, and MoCo, we are consistently able to locate such matching subnetworks at 59.04% to 96.48% sparsity that transfer universally to multiple downstream tasks, whose performance see no degradation compared to using full pre-trained weights. Further analyses reveal that subnetworks found from different pre-training tend to yield diverse mask structures and perturbation sensitivities. We conclude that the core LTH observations remain generally relevant in the pre-training paradigm of computer vision, but more delicate discussions are needed in some cases. Codes and pre-trained models will be made available at: https://github.com/VITA-Group/CV_LTH_Pre-training.

引用

页码：16301 / 16311

页数：11

共 50 条

[1] Self-supervised ECG pre-training
Liu, Han
Zhao, Zhenbo
She, Qiang
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2021, 70
[2] Self-supervised Pre-training for Mirror Detection
Lin, Jiaying
Lau, Rynson W. H.
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 12193 - 12202
[3] EFFECTIVENESS OF SELF-SUPERVISED PRE-TRAINING FOR ASR
Baevski, Alexei
Mohamed, Abdelrahman
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7694 - 7698
[4] Self-supervised Pre-training for Nuclei Segmentation
Haq, Mohammad Minhazul
Huang, Junzhou
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT II, 2022, 13432 : 303 - 313
[5] A SELF-SUPERVISED PRE-TRAINING FRAMEWORK FOR VISION-BASED SEIZURE CLASSIFICATION
Hou, Jen-Cheng
McGonigal, Aileen
Bartolomei, Fabrice
Thonnat, Monique
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1151 - 1155
[6] Self-Supervised Pre-training for Time Series Classification
Shi, Pengxiang
Ye, Wenwen
Qin, Zheng
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[7] Single-atom catalysts property prediction via Supervised and Self-Supervised pre-training models
Wang, Lanjing
Chen, Honghao
Yang, Longqi
Li, Jiali
Li, Yong
Wang, Xiaonan
CHEMICAL ENGINEERING JOURNAL, 2024, 487
[8] A Closer Look at Invariances in Self-supervised Pre-training for 3D Vision
Li, Lanxiao
Heizmann, Michael
COMPUTER VISION - ECCV 2022, PT XXX, 2022, 13690 : 656 - 673
[9] Comparing Self-Supervised Pre-Training and Semi-Supervised Training for Speech Recognition in Languages with Weak Language Models
Lam-Yee-Mui, Lea-Marie
Yang, Lucas Ondel
Klejch, Ondrej
INTERSPEECH 2023, 2023, : 87 - 91
[10] Self-supervised Pre-training and Semi-supervised Learning for Extractive Dialog Summarization
Zhuang, Yingying
Song, Jiecheng
Sadagopan, Narayanan
Beniwal, Anurag
COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 1069 - 1076

← 1 2 3 4 5 →