A Win-win Deal: Towards Sparse and Robust Pre-trained Language Models

被引：0

作者：

Liu, Yuanxin ^{[1
,2
,3
,6
]}

Meng, Fandong ^{[5
]}

Lin, Zheng ^{[1
,4
,5
]}

Li, Jiangnan ^{[1
,4
]}

Fu, Peng ^{[1
]}

Cao, Yanan ^{[1
,4
]}

Wang, Weiping ^{[1
]}

Zhou, Jie ^{[5
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[2] Peking Univ, MOE Key Lab Computat Linguist, Beijing, Peoples R China

[3] Peking Univ, Sch Comp Sci, Beijing, Peoples R China

[4] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

[5] Tencent Inc, WeChat AI, Pattern Recognit Ctr, Shenzhen, Peoples R China

[6] Chinese Acad Sci, IIE, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the remarkable success of pre-trained language models (PLMs), they still face two challenges: First, large-scale PLMs are inefficient in terms of memory footprint and computation. Second, on the downstream tasks, PLMs tend to rely on the dataset bias and struggle to generalize to out-of-distribution (OOD) data. In response to the efficiency problem, recent studies show that dense PLMs can be replaced with sparse subnetworks without hurting the performance. Such subnetworks can be found in three scenarios: 1) the fine-tuned PLMs, 2) the raw PLMs and then fine-tuned in isolation, and even inside 3) PLMs without any parameter fine-tuning. However, these results are only obtained in the indistribution (ID) setting. In this paper, we extend the study on PLMs subnetworks to the OOD setting, investigating whether sparsity and robustness to dataset bias can be achieved simultaneously. To this end, we conduct extensive experiments with the pre-trained BERT model on three natural language understanding (NLU) tasks. Our results demonstrate that sparse and robust subnetworks (SRNets) can consistently be found in BERT, across the aforementioned three scenarios, using different training and compression methods. Furthermore, we explore the upper bound of SRNets using the OOD information and show that there exist sparse and almost unbiased BERT subnetworks. Finally, we present 1) an analytical study that provides insights on how to promote the efficiency of SRNets searching process and 2) a solution to improve subnetworks' performance at high sparsity. The code is available at https://github.com/llyx97/sparse-and-robust-PLM.

引用

页数：14

共 50 条

[1] A WIN-WIN DEAL
TRUE, WR
OIL & GAS JOURNAL, 1993, 91 (36) : 19 - 19
[2] The Art of the Win-Win Deal
Colbert-Lewis, Danielle
SERIALS REVIEW, 2019, 45 (03) : 146 - 147
[3] Demystifying the Cost of Serverless Computing: Towards a Win-Win Deal
Liu, Fangming
Niu, Yipei
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (01) : 59 - 72
[4] Program Analysis and Machine Learning: A Win-Win Deal
Nori, Aditya V.
Rajamani, Sriram K.
STATIC ANALYSIS, 2011, 6887 : 2 - 3
[5] Exploring Robust Overfitting for Pre-trained Language Models
Zhu, Bin
Rao, Yanghui
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5506 - 5522
[6] Robust Lottery Tickets for Pre-trained Language Models
Zheng, Rui
Bao, Rong
Zhou, Yuhao
Liang, Di
Wane, Sirui
Wu, Wei
Gui, Tao
Zhang, Qi
Huang, Xuanjing
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 2211 - 2224
[7] MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
Thangarasa, Vithursan
Salem, Mahmoud
Saxena, Shreyas
Leong, Kevin
Hestness, Joel
Lie, Sean
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 214 - 230
[8] Computational Limitations in Robust Classification and Win-Win Results
Degwekar, Akshay
Nakkiran, Preetum
Vaikuntanathan, Vinod
CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
[9] Sparse Low-rank Adaptation of Pre-trained Language Models
Ding, Ning
Lv, Xingtai
Wang, Qiaosen
Chen, Yulin
Zhou, Bowen
Liu, Zhiyuan
Sun, Maosong
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4133 - 4145
[10] Pre-Trained Language Models and Their Applications
Wang, Haifeng
Li, Jiwei
Wu, Hua
Hovy, Eduard
Sun, Yu
ENGINEERING, 2023, 25 : 51 - 65

← 1 2 3 4 5 →