One is Not Enough: Parameter-Efficient Fine-Tuning With Multiplicative Sparse Factorization

被引：0

作者：

Chen, Xuxi ^{[1
]}

Chen, Tianlong ^{[2
]}

Cheng, Yu ^{[3
]}

Chen, Weizhu ^{[4
]}

Awadallah, Ahmed Hassan ^{[4
]}

Wang, Zhangyang ^{[1
]}

机构：

[1] UT Austin, Dept Elect & Comp Engn, Austin, TX 78705 USA

[2] Univ North Carolina Chapel Hill, Dept Comp Sci, Chapel Hill, NC 27599 USA

[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[4] Microsoft Res, Redmond, WA 98052 USA

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2024年 / 18卷 / 06期

关键词：

Sparse matrices; Task analysis; Matrix decomposition; Adaptation models; Tuning; Training; Optimization; Parameter-efficient fine-tuning; factorization; MATRIX FACTORIZATION;

D O I：

10.1109/JSTSP.2024.3431927

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Fine-tuning gigantic pre-trained models is becoming a canonical paradigm in natural language processing. Unfortunately, as the pre-trained models grow larger, even the conventional fine-tuning becomes prohibitively resource-consuming. That motivates the recent surge of parameter-efficient fine-tuning methods by selectively updating a small portion of model parameters. Existing methods either customize add-on modules (e.g., adapter, prompter), or refer to weight parameter decomposition which relies on strong structural assumptions (e.g., sparse or low-rank updates) and ad-hoc pre-defined structure parameters (e.g., layerwise sparsities, or the intrinsic rank). Extending the latter line of work, this paper proposes a new weight structured decomposition scheme for parameter-efficient fine-tuning, that is designed to be (i) flexible, covering a much broader matrix family, with sparse or low-rank matrices as special cases; (ii) (nearly) hyperparameter-free, requiring only a global parameter budget as input. This new scheme, dubbed AutoSparse, meets the two goals by factorizing each layer's weight update into a product of multiple sparse matrix factors. Notably, the sparsity levels of all those matrices are automatically allocated (without adopting any heuristic or ad-hoc tuning), through one holistic budget-constrained optimization. It can be solved by the projected gradient descent method that can be painlessly plugged in normal fine-tuning. Extensive experiments and in-depth studies on diverse architectures/tasks like {BERT, RoBERTa, BART}, consistently endorse the superior parameter efficiency of AutoSparse to surpass state-of-the-arts. For instance, AutoSparse with BERT can operate at only 0.5% trainable parameters, while hitting an accuracy of 83.2% on MNLI-mismatched.

引用

页码：1059 / 1069

页数：11

共 50 条

[31] CPMI-ChatGLM: parameter-efficient fine-tuning ChatGLM with Chinese patent medicine instructions
Liu, Can
Sun, Kaijie
Zhou, Qingqing
Duan, Yuchen
Shu, Jianhua
Kan, Hongxing
Gu, Zongyun
Hu, Jili
SCIENTIFIC REPORTS, 2024, 14 (01)
[32] Parameter-efficient fine-tuning of large-scale pre-trained language models
Ning Ding
Yujia Qin
Guang Yang
Fuchao Wei
Zonghan Yang
Yusheng Su
Shengding Hu
Yulin Chen
Chi-Min Chan
Weize Chen
Jing Yi
Weilin Zhao
Xiaozhi Wang
Zhiyuan Liu
Hai-Tao Zheng
Jianfei Chen
Yang Liu
Jie Tang
Juanzi Li
Maosong Sun
Nature Machine Intelligence, 2023, 5 : 220 - 235
[33] LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
Hu, Zhiqiang
Wang, Lei
Lan, Yihuai
Xu, Wanyu
Lim, Ee-Peng
Bing, Lidong
Xu, Xing
Poria, Soujanya
Lee, Roy Ka-Wei
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5254 - 5276
[34] DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
Xie, Enze
Yao, Lewei
Shi, Han
Liu, Zhili
Zhou, Daquan
Liu, Zhaoqiang
Li, Jiawei
Li, Zhenguo
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4207 - 4216
[35] Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis
Liu, Mingyuan
Xu, Lu
Liu, Shengnan
Zhang, Jicong
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 627 - 637
[36] Transferrable DP-Adapter Tuning: A Privacy-Preserving Multimodal Parameter-Efficient Fine-Tuning Framework
Ji, Lixia
Xiao, Shijie
Xu, Bingzhi
Zhang, Han
2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 471 - 482
[37] An Empirical Study of Parameter-Efficient Fine-Tuning Methods for Pre-trained Code Models
Liu, Jiaxing
Sha, Chaofeng
Peng, Xin
2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 397 - 408
[38] Parameter-efficient fine-tuning large language model approach for hospital discharge paper summarization
Goswami, Joyeeta
Prajapati, Kaushal Kumar
Saha, Ashim
Saha, Apu Kumar
APPLIED SOFT COMPUTING, 2024, 157
[39] Parameter-efficient fine-tuning of large-scale pre-trained language models
Ding, Ning
Qin, Yujia
Yang, Guang
Wei, Fuchao
Yang, Zonghan
Su, Yusheng
Hu, Shengding
Chen, Yulin
Chan, Chi-Min
Chen, Weize
Yi, Jing
Zhao, Weilin
Wang, Xiaozhi
Liu, Zhiyuan
Zheng, Hai-Tao
Chen, Jianfei
Liu, Yang
Tang, Jie
Li, Juanzi
Sun, Maosong
NATURE MACHINE INTELLIGENCE, 2023, 5 (03) : 220 - +
[40] PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Zhu, Ligeng
Hu, Lanxiang
Lin, Ji
Wang, Wei-Chen
Chen, Wei-Ming
Gan, Chuang
Han, Song
56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023, 2023, : 1381 - 1394

← 1 2 3 4 5 →