Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training

被引:0
|
作者
Liu, Yuanxin [1 ,2 ]
Meng, Fandong [3 ]
Lin, Zheng [1 ,2 ]
Fu, Peng [1 ]
Cao, Yanan [1 ,2 ]
Wang, Weiping [1 ]
Zhou, Jie [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original PLM. These subnetworks are found using magnitude-based pruning. In this paper, we find that the BERT subnetworks have even more potential than these studies have shown. Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability. Inspired by this, we propose to directly optimize the subnetwork structure towards the pre-training objectives, which can better preserve the pre-training performance. Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream tasks. We then fine-tune the subnetworks on the GLUE benchmark and the SQuAD dataset. The results show that, compared with magnitude pruning, mask training can effectively find BERT subnetworks with improved overall performance on downstream tasks. Moreover, our method is also more efficient in searching subnetworks and more advantageous when fine-tuning within a certain range of data scarcity. Our code is available at https://github.com/llyx97/TAMT.
引用
收藏
页码:5840 / 5857
页数:18
相关论文
共 50 条
  • [31] Task-agnostic representation learning of multimodal twitter data for downstream applications
    Rivas, Ryan
    Paul, Sudipta
    Hristidis, Vagelis
    Papalexakis, Evangelos E.
    Roy-Chowdhury, Amit K.
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [32] Using Winning Lottery Tickets in Transfer Learning for Convolutional Neural Networks
    Van Soelen, Ryan
    Sheppard, John W.
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [33] Enhancing road surface recognition via optimal transport and metric learning in task-agnostic intelligent driving environments
    Chen, Yuyi
    Yang, Shichun
    Wang, Rui
    Li, Zhuoyang
    Li, Qiuyue
    Tong, Zexiang
    Cao, Yaoguang
    Zhou, Fan
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 266
  • [34] Data-Efficient Double-Win Lottery Tickets from Robust Pre-training
    Chen, Tianlong
    Zhang, Zhenyu
    Liu, Sijia
    Zhang, Yang
    Chang, Shiyu
    Wang, Zhangyang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [35] VariGrow: Variational Architecture Growing for Task-Agnostic Continual Learning based on Bayesian Novelty
    Ardywibowo, Randy
    Huo, Zepeng
    Wang, Zhangyang
    Mortazavi, Bobak
    Huang, Shuai
    Qian, Xiaoning
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 865 - 877
  • [36] Discover-then-Name: Task-Agnostic Concept Bottlenecks via Automated Concept Discovery
    Rao, Sukrut
    Mahajan, Sweta
    Boehle, Moritz
    Schiele, Bernt
    COMPUTER VISION - ECCV 2024, PT LXXVII, 2024, 15135 : 444 - 461
  • [37] DexBERT: Effective, Task-Agnostic and Fine-Grained Representation Learning of Android Bytecode
    Sun T.
    Allix K.
    Kim K.
    Zhou X.
    Kim D.
    Lo D.
    Bissyande T.F.
    Klein J.
    IEEE Transactions on Software Engineering, 2023, 49 (10) : 4691 - 4706
  • [38] VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
    Xu, Hu
    Ghosh, Gargi
    Huang, Po-Yao
    Arora, Prahal
    Aminzadeh, Masoumeh
    Feichtenhofer, Christoph
    Metze, Florian
    Zettlemoyer, Luke
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4227 - 4239
  • [39] Towards Learning Generalizable Code Embeddings Using Task-agnostic Graph Convolutional Networks
    Ding, Zishuo
    Li, Heng
    Shang, Weiyi
    Chen, Tse-Hsun
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2023, 32 (02)
  • [40] Downstream Task-agnostic Transferable Attacks on Language-Image Pre-training Models
    Lv, Yiqiang
    Chen, Jingjing
    Wei, Zhipeng
    Chen, Kai
    Wu, Zuxuan
    Jiang, Yu-Gang
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2831 - 2836