A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

被引:0
|
作者
Liu, Yuanxin [1 ,2 ,3 ,6 ]
Meng, Fandong [5 ]
Lin, Zheng [1 ,4 ,5 ]
Li, Jiangnan [1 ,4 ]
Fu, Peng [1 ]
Cao, Yanan [1 ,4 ]
Wang, Weiping [1 ]
Zhou, Jie [5 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Peking Univ, MOE Key Lab Computat Linguist, Beijing, Peoples R China
[3] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
[4] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[5] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Shenzhen, Peoples R China
[6] Chinese Acad Sci, IIE, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the remarkable success of pre-trained language models (PLMs), they still face two challenges: First, large-scale PLMs are inefficient in terms of memory footprint and computation. Second, on the downstream tasks, PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data. In response to the efficiency problem, recent studies show that dense PLMs can be replaced with sparse subnetworks without hurting the performance. Such subnetworks can be found in three scenarios: 1) the fine-tuned PLMs, 2) the raw PLMs and then fine-tuned in isolation, and even inside 3) PLMs without any parameter fine-tuning. However, these results are only obtained in the indistribution (ID) setting. In this paper, we extend the study on PLMs subnetworks to the OOD setting, investigating whether sparsity and robustness to dataset bias can be achieved simultaneously. To this end, we conduct extensive experiments with the pre-trained BERT model on three natural language understanding (NLU) tasks. Our results demonstrate that sparse and robust subnetworks (SRNets) can consistently be found in BERT, across the aforementioned three scenarios, using different training and compression methods. Furthermore, we explore the upper bound of SRNets using the OOD information and show that there exist sparse and almost unbiased BERT subnetworks. Finally, we present 1) an analytical study that provides insights on how to promote the efficiency of SRNets searching process and 2) a solution to improve subnetworks' performance at high sparsity. The code is available at https://github.com/llyx97/sparse-and-robust-PLM.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] A WIN-WIN DEAL
    TRUE, WR
    OIL & GAS JOURNAL, 1993, 91 (36) : 19 - 19
  • [2] The Art of the Win-Win Deal
    Colbert-Lewis, Danielle
    SERIALS REVIEW, 2019, 45 (03) : 146 - 147
  • [3] Demystifying the Cost of Serverless Computing: Towards a Win-Win Deal
    Liu, Fangming
    Niu, Yipei
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (01) : 59 - 72
  • [4] Program Analysis and Machine Learning: A Win-Win Deal
    Nori, Aditya V.
    Rajamani, Sriram K.
    STATIC ANALYSIS, 2011, 6887 : 2 - 3
  • [5] Exploring Robust Overfitting for Pre-trained Language Models
    Zhu, Bin
    Rao, Yanghui
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5506 - 5522
  • [6] Robust Lottery Tickets for Pre-trained Language Models
    Zheng, Rui
    Bao, Rong
    Zhou, Yuhao
    Liang, Di
    Wane, Sirui
    Wu, Wei
    Gui, Tao
    Zhang, Qi
    Huang, Xuanjing
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2211 - 2224
  • [7] MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
    Thangarasa, Vithursan
    Salem, Mahmoud
    Saxena, Shreyas
    Leong, Kevin
    Hestness, Joel
    Lie, Sean
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 214 - 230
  • [8] Computational Limitations in Robust Classification and Win-Win Results
    Degwekar, Akshay
    Nakkiran, Preetum
    Vaikuntanathan, Vinod
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [9] Sparse Low-rank Adaptation of Pre-trained Language Models
    Ding, Ning
    Lv, Xingtai
    Wang, Qiaosen
    Chen, Yulin
    Zhou, Bowen
    Liu, Zhiyuan
    Sun, Maosong
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4133 - 4145
  • [10] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    ENGINEERING, 2023, 25 : 51 - 65