Data-Efficient Double-Win Lottery Tickets from Robust Pre-training

被引:0
|
作者
Chen, Tianlong [1 ]
Zhang, Zhenyu [1 ]
Liu, Sijia [2 ,3 ]
Zhang, Yang [3 ]
Chang, Shiyu [4 ]
Wang, Zhangyang [1 ]
机构
[1] Univ Texas Austin, Dept Elect & Comp Engn, Austin, TX 78712 USA
[2] Michigan State Univ, E Lansing, MI 48824 USA
[3] MIT, IBM Watson AI Lab, Cambridge, MA 02139 USA
[4] Univ Calif Santa Barbara, Santa Barbara, CA 93106 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pre-training serves as a broadly adopted starting point for transfer learning on various downstream tasks. Recent investigations of lottery tickets hypothesis (LTH) demonstrate such enormous pre-trained models can be replaced by extremely sparse subnetworks (a.k.a. matching sub networks) without sacrificing transferability. However, practical security-crucial applications usually pose more challenging requirements beyond standard transfer, which also demand these subnetworks to overcome adversarial vulnerability. In this paper, we formulate a more rigorous concept, DoubleWin Lottery Tickets, in which a located subnetwork from a pre-trained model can be independently transferred on diverse downstream tasks, to reach BOTH the same standard and robust generalization, under BOTH standard and adversarial training regimes, as the full pre-trained model can do. We comprehensively examine various pre-training mechanisms and find that robust pretraining tends to craft sparser double-win lottery tickets with superior performance over the standard counterparts. For example, on downstream CIFAR-10/100 datasets, we identify double-win matching subnetworks with the standard, fast adversarial, and adversarial pre-training from ImageNet, at 89.26%/73.79%, 89.26%/79.03%, and 91.41%/83.22% sparsity, respectively. Furthermore, we observe the obtained double-win lottery tickets can be more data-efficient to transfer, under practical data-limited (e.g., 1% and 10%) downstream schemes. Our results show that the benefits from robust pre-training are amplified by the lottery ticket scheme, as well as the data-limited transfer setting. Codes are available at https://github.com/VITA-Group/ Double- Win- LTH.
引用
收藏
页数:13
相关论文
共 24 条
  • [11] Designing Pre-training Datasets from Unlabeled Data for EEG Classification with Transformers
    Bary, Tim
    Macq, Benoit
    2024 IEEE 22ND MEDITERRANEAN ELECTROTECHNICAL CONFERENCE, MELECON 2024, 2024, : 25 - 30
  • [12] Seasonal Contrast: Unsupervised Pre-Training from Uncurated Remote Sensing Data
    Manas, Oscar
    Lacoste, Alexandre
    Giro-i-Nieto, Xavier
    Vazquez, David
    Rodriguez, Pau
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9394 - 9403
  • [13] Data-Efficient Information Extraction from Documents with Pre-trained Language Models
    Sage, Clement
    Douzon, Thibault
    Aussem, Alex
    Eglin, Veronique
    Elghazel, Haytham
    Duffner, Stefan
    Garcia, Christophe
    Espinas, Jeremy
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT II, 2021, 12917 : 455 - 469
  • [14] Knowledge Distillation as Efficient Pre-training: Faster Convergence, Higher Data-efficiency, and Better Transferability
    He, Ruifei
    Sun, Shuyang
    Yang, Jihan
    Bai, Song
    Qi, Xiaojuan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9151 - 9161
  • [15] WenLan: Efficient Large-Scale Multi-Modal Pre-Training on Real World Data
    Song, Ruihua
    MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 3 - 3
  • [16] "According to... ": Prompting Language Models Improves Quoting from Pre-Training Data
    Weller, Orion
    Marone, Marc
    Weir, Nathaniel
    Lawrie, Dawn
    Khashabi, Daniel
    Van Durme, Benjamin
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 2288 - 2301
  • [17] Data Balancing Based on Pre-Training Strategy for Liver Segmentation from CT Scans
    Zhang, Yong
    Wang, Yi
    Wang, Yizhu
    Fang, Bin
    Yu, Wei
    Long, Hongyu
    Lei, Hancheng
    APPLIED SCIENCES-BASEL, 2019, 9 (09):
  • [18] Pre-training a Neural Model to Overcome Data Scarcity in Relation Extraction from Text
    Jung, Seokwoo
    Myaeng, Sung-Hyon
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2019, : 176 - 180
  • [19] Robust Cardiac MRI Segmentation with Data-Centric Models to Improve Performance via Intensive Pre-training and Augmentation
    Gong, Shizhan
    Lu, Weitao
    Xie, Jize
    Zhang, Xiaofan
    Zhang, Shaoting
    Dou, Qi
    STATISTICAL ATLASES AND COMPUTATIONAL MODELS OF THE HEART: REGULAR AND CMRXMOTION CHALLENGE PAPERS, STACOM 2022, 2022, 13593 : 494 - 504
  • [20] Task2Sim: Towards Effective Pre-training and Transfer from Synthetic Data
    Mishra, Samarth
    Panda, Rameswar
    Phoo, Cheng Perng
    Chen, Chun-Fu
    Karlinsky, Leonid
    Saenko, Kate
    Saligrama, Venkatesh
    Feris, Rogerio S.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9184 - 9194