Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training

被引:0
|
作者
Liu, Yuanxin [1 ,2 ]
Meng, Fandong [3 ]
Lin, Zheng [1 ,2 ]
Fu, Peng [1 ]
Cao, Yanan [1 ,2 ]
Wang, Weiping [1 ]
Zhou, Jie [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original PLM. These subnetworks are found using magnitude-based pruning. In this paper, we find that the BERT subnetworks have even more potential than these studies have shown. Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability. Inspired by this, we propose to directly optimize the subnetwork structure towards the pre-training objectives, which can better preserve the pre-training performance. Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream tasks. We then fine-tune the subnetworks on the GLUE benchmark and the SQuAD dataset. The results show that, compared with magnitude pruning, mask training can effectively find BERT subnetworks with improved overall performance on downstream tasks. Moreover, our method is also more efficient in searching subnetworks and more advantageous when fine-tuning within a certain range of data scarcity. Our code is available at https://github.com/llyx97/TAMT.
引用
收藏
页码:5840 / 5857
页数:18
相关论文
共 50 条
  • [1] SuperTickets: Drawing Task-Agnostic Lottery Tickets from Supernets via Jointly Architecture Searching and Parameter Pruning
    You, Haoran
    Li, Baopu
    Sun, Zhanyi
    Xu Ouyang
    Lin, Yingyan
    COMPUTER VISION, ECCV 2022, PT XI, 2022, 13671 : 674 - 690
  • [2] EarlyBERT: Efficient BERT Training via Early-bird Lottery Tickets
    Chen, Xiaohan
    Cheng, Yu
    Wang, Shuohang
    Gan, Zhe
    Wang, Zhangyang
    Liu, Jingjing
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2195 - 2207
  • [3] Task-Agnostic Safety for Reinforcement Learning
    Rahman, Md Asifur
    Alqahtani, Sarra
    PROCEEDINGS OF THE 16TH ACM WORKSHOP ON ARTIFICIAL INTELLIGENCE AND SECURITY, AISEC 2023, 2023, : 139 - 148
  • [4] Extract then Distill: Efficient and Effective Task-Agnostic BERT Distillation
    Chen, Cheng
    Yin, Yichun
    Shang, Lifeng
    Wang, Zhi
    Jiang, Xin
    Chen, Xiao
    Liu, Qun
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT III, 2021, 12893 : 570 - 581
  • [5] Task-agnostic Exploration in Reinforcement Learning
    Zhang, Xuezhou
    Ma, Yuzhe
    Singla, Adish
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [6] MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
    Sun, Zhiqing
    Yu, Hongkun
    Song, Xiaodan
    Liu, Renjie
    Yang, Yiming
    Zhou, Denny
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2158 - 2170
  • [7] To BERT or Not to BERT: Comparing Task-specific and Task-agnostic Semi-Supervised Approaches for Sequence Tagging
    Bhattacharjee, Kasturi
    Ballesteros, Miguel
    Anubhai, Rishita
    Muresan, Smaranda
    Ma, Jie
    Ladhak, Faisal
    Al-Onaizan, Yaser
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 7927 - 7934
  • [8] Improving task-agnostic BERT distillation with layer mapping search q
    Jiao, Xiaoqi
    Chang, Huating
    Yin, Yichun
    Shang, Lifeng
    Jiang, Xin
    Chen, Xiao
    Li, Linlin
    Wang, Fang
    Liu, Qun
    NEUROCOMPUTING, 2021, 461 : 194 - 203
  • [9] Loss Decoupling for Task-Agnostic Continual Learning
    Liang, Yan-Shuo
    Li, Wu-Jun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Hierarchically structured task-agnostic continual learning
    Heinke Hihn
    Daniel A. Braun
    Machine Learning, 2023, 112 : 655 - 686