Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training

被引:0
|
作者
Liu, Yuanxin [1 ,2 ]
Meng, Fandong [3 ]
Lin, Zheng [1 ,2 ]
Fu, Peng [1 ]
Cao, Yanan [1 ,2 ]
Wang, Weiping [1 ]
Zhou, Jie [3 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies on the lottery ticket hypothesis (LTH) show that pre-trained language models (PLMs) like BERT contain matching subnetworks that have similar transfer learning performance as the original PLM. These subnetworks are found using magnitude-based pruning. In this paper, we find that the BERT subnetworks have even more potential than these studies have shown. Firstly, we discover that the success of magnitude pruning can be attributed to the preserved pre-training performance, which correlates with the downstream transferability. Inspired by this, we propose to directly optimize the subnetwork structure towards the pre-training objectives, which can better preserve the pre-training performance. Specifically, we train binary masks over model weights on the pre-training tasks, with the aim of preserving the universal transferability of the subnetwork, which is agnostic to any specific downstream tasks. We then fine-tune the subnetworks on the GLUE benchmark and the SQuAD dataset. The results show that, compared with magnitude pruning, mask training can effectively find BERT subnetworks with improved overall performance on downstream tasks. Moreover, our method is also more efficient in searching subnetworks and more advantageous when fine-tuning within a certain range of data scarcity. Our code is available at https://github.com/llyx97/TAMT.
引用
收藏
页码:5840 / 5857
页数:18
相关论文
共 50 条
  • [41] Task-Agnostic Privacy-Preserving Representation Learning for Federated Learning against Attribute Inference Attacks
    Arevalo, Caridad Arroyo
    Noorbakhsh, Sayedeh Leila
    Dong, Yun
    Hong, Yuan
    Wang, Binghui
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 10, 2024, : 10909 - 10917
  • [42] Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning
    Shin, Kyuyong
    Kwak, Hanock
    Kim, Wonjae
    Jeong, Jisu
    Jung, Seungjae
    Kim, Kyung-Min
    Ha, Jung-Woo
    Lee, Sang-Woo
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1146 - 1161
  • [43] Learning Task-Agnostic Embedding of Multiple Black-Box Experts for Multi-Task Model Fusion
    Trong Nghia Hoang
    Chi Thanh Lam
    Low, Bryan Kian Hsiang
    Jaillet, Patrick
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [44] Federated Split Vision Transformer for COVID-19 CXR Diagnosis using Task-Agnostic Training
    Park, Sangjoon
    Kim, Gwanghyun
    Kim, Jeongsol
    Kim, Boah
    Ye, Jong Chul
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [45] Task-Agnostic Exploration via Policy Gradient of a Non-Parametric State Entropy Estimate
    Mutti, Mirco
    Pratissoli, Lorenzo
    Restelli, Marcello
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9028 - 9036
  • [46] Masked Token Enabled Pre-Training: A Task-Agnostic Approach for Understanding Complex Traffic Flow
    Hou, Lu
    Geng, Yunxing
    Han, Lingyi
    Yang, Haojun
    Zheng, Kan
    Wang, Xianbin
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 11121 - 11132
  • [47] Episodic task agnostic contrastive training for multi-task learning?
    Zhou, Fan
    Chen, Yuyi
    Wen, Jun
    Zeng, Qiuhao
    Shui, Changjian
    Ling, Charles X.
    Yang, Shichun
    Wang, Boyu
    NEURAL NETWORKS, 2023, 162 : 34 - 45
  • [48] Task-Agnostic Continual Learning Using Online Variational Bayes With Fixed-Point Updates
    Zeno, Chen
    Golan, Itay
    Hoffer, Elad
    Soudry, Daniel
    NEURAL COMPUTATION, 2021, 33 (11) : 3139 - 3177
  • [49] Unsupervised Representation Learning with Task-Agnostic Feature Masking for Robust End-to-End Speech Recognition
    Kim, June-Woo
    Chung, Hoon
    Jung, Ho-Young
    MATHEMATICS, 2023, 11 (03)
  • [50] Accelerating Reinforcement Learning for Autonomous Driving using Task-Agnostic and Ego-Centric Motion Skills
    Zhou, Tong
    Wang, Letian
    Chen, Ruobing
    Wang, Wenshuo
    Liu, Yu
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 11289 - 11296