Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training

被引:0
|
作者
Liu, Yuanxin [1 ,2 ]
Meng, Fandong [3 ]
Lin, Zheng [1 ,2 ]
Fu, Peng [1 ]
Cao, Yanan [1 ,2 ]
Wang, Weiping [1 ]
Zhou, Jie [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original PLM. These subnetworks are found using magnitude-based pruning. In this paper, we find that the BERT subnetworks have even more potential than these studies have shown. Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability. Inspired by this, we propose to directly optimize the subnetwork structure towards the pre-training objectives, which can better preserve the pre-training performance. Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream tasks. We then fine-tune the subnetworks on the GLUE benchmark and the SQuAD dataset. The results show that, compared with magnitude pruning, mask training can effectively find BERT subnetworks with improved overall performance on downstream tasks. Moreover, our method is also more efficient in searching subnetworks and more advantageous when fine-tuning within a certain range of data scarcity. Our code is available at https://github.com/llyx97/TAMT.
引用
收藏
页码:5840 / 5857
页数:18
相关论文
共 50 条
  • [21] Interesting Object, Curious Agent: Learning Task-Agnostic Exploration
    Parisi, Simone
    Dean, Victoria
    Pathak, Deepak
    Gupta, Abhinav
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [22] Continual Deep Reinforcement Learning with Task-Agnostic Policy Distillation
    Hafez, Muhammad Burhan
    Erekmen, Kerim
    arXiv,
  • [23] Learning from History: Task-agnostic Model Contrastive Learning for Image Restoration
    Wu, Gang
    Jiang, Junjun
    Jiang, Kui
    Liu, Xianming
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6, 2024, : 5976 - 5984
  • [24] Task-agnostic representation learning of multimodal twitter data for downstream applications
    Ryan Rivas
    Sudipta Paul
    Vagelis Hristidis
    Evangelos E. Papalexakis
    Amit K. Roy-Chowdhury
    Journal of Big Data, 9
  • [25] A Task-Agnostic Regularizer for Diverse Subpolicy Discovery in Hierarchical Reinforcement Learning
    Huo, Liangyu
    Wang, Zulin
    Xu, Mai
    Song, Yuhang
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (03): : 1932 - 1944
  • [26] Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes
    Xu, Mengdi
    Ding, Wenhao
    Zhu, Jiacheng
    Liu, Zuxin
    Chen, Baiming
    Zhao, Ding
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [27] TASK-AGNOSTIC CONTINUAL REINFORCEMENT LEARNING: GAINING INSIGHTS AND OVERCOMING CHALLENGES
    Caccia, Massimo
    Mueller, Jonas
    Kim, Taesup
    Charlin, Laurent
    Fakoor, Rasool
    CONFERENCE ON LIFELONG LEARNING AGENTS, VOL 232, 2023, 232 : 89 - 119
  • [28] TASK-AGNOSTIC CONTINUAL LEARNING USING BASE-CHILD CLASSIFIERS
    Singh, Pranshu Ranjan
    Gopalakrishnan, Saisubramaniam
    Qiao ZhongZheng
    Suganthan, Ponnuthurai N.
    Ramasamy, Savitha
    Ambikapathi, ArulMurugan
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 794 - 798
  • [29] Towards a Task-Agnostic Model of Difficulty Estimation for Supervised Learning Tasks
    Laverghetta, Antonio, Jr.
    Mirzakhalov, Jamshidbek
    Licato, John
    AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 16 - 23
  • [30] Task-agnostic exoskeleton control via biological joint moment estimation
    Molinaro, Dean D.
    Scherpereel, Keaton L.
    Schonhaut, Ethan B.
    Evangelopoulos, Georgios
    Shepherd, Max K.
    Young, Aaron J.
    NATURE, 2024, 635 (8038) : 337 - 344