One is Not Enough: Parameter-Efficient Fine-Tuning With Multiplicative Sparse Factorization

被引：0

作者：

Chen, Xuxi ^{[1
]}

Chen, Tianlong ^{[2
]}

Cheng, Yu ^{[3
]}

Chen, Weizhu ^{[4
]}

Awadallah, Ahmed Hassan ^{[4
]}

Wang, Zhangyang ^{[1
]}

机构：

[1] UT Austin, Dept Elect & Comp Engn, Austin, TX 78705 USA

[2] Univ North Carolina Chapel Hill, Dept Comp Sci, Chapel Hill, NC 27599 USA

[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[4] Microsoft Res, Redmond, WA 98052 USA

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2024年 / 18卷 / 06期

关键词：

Sparse matrices; Task analysis; Matrix decomposition; Adaptation models; Tuning; Training; Optimization; Parameter-efficient fine-tuning; factorization; MATRIX FACTORIZATION;

D O I：

10.1109/JSTSP.2024.3431927

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Fine-tuning gigantic pre-trained models is becoming a canonical paradigm in natural language processing. Unfortunately, as the pre-trained models grow larger, even the conventional fine-tuning becomes prohibitively resource-consuming. That motivates the recent surge of parameter-efficient fine-tuning methods by selectively updating a small portion of model parameters. Existing methods either customize add-on modules (e.g., adapter, prompter), or refer to weight parameter decomposition which relies on strong structural assumptions (e.g., sparse or low-rank updates) and ad-hoc pre-defined structure parameters (e.g., layerwise sparsities, or the intrinsic rank). Extending the latter line of work, this paper proposes a new weight structured decomposition scheme for parameter-efficient fine-tuning, that is designed to be (i) flexible, covering a much broader matrix family, with sparse or low-rank matrices as special cases; (ii) (nearly) hyperparameter-free, requiring only a global parameter budget as input. This new scheme, dubbed AutoSparse, meets the two goals by factorizing each layer's weight update into a product of multiple sparse matrix factors. Notably, the sparsity levels of all those matrices are automatically allocated (without adopting any heuristic or ad-hoc tuning), through one holistic budget-constrained optimization. It can be solved by the projected gradient descent method that can be painlessly plugged in normal fine-tuning. Extensive experiments and in-depth studies on diverse architectures/tasks like {BERT, RoBERTa, BART}, consistently endorse the superior parameter efficiency of AutoSparse to surpass state-of-the-arts. For instance, AutoSparse with BERT can operate at only 0.5% trainable parameters, while hitting an accuracy of 83.2% on MNLI-mismatched.

引用

页码：1059 / 1069

页数：11

共 50 条

[1] On the Effectiveness of Parameter-Efficient Fine-Tuning
Fu, Zihao
Yang, Haoran
So, Anthony Man-Cho
Lam, Wai
Bing, Lidong
Collier, Nigel
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12799 - 12807
[2] Frozen Weights as Prior for Parameter-Efficient Fine-Tuning
Ma, Xiaolong
Liu, Peishun
Gao, Haojie
Yan, Zikang
Ma, Ningning
Liu, Wenqiang
Wang, Xuefang
Tang, Ruichun
IEEE ACCESS, 2025, 13 : 24411 - 24425
[3] Parameter-Efficient Fine-Tuning without Introducing New Latency
Liao, Baohao
Meng, Yan
Monz, Christof
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4242 - 4260
[4] Democratizing protein language models with parameter-efficient fine-tuning
Sledzieski, Samuel
Kshirsagar, Meghana
Baek, Minkyung
Dodhia, Rahul
Ferres, Juan Lavista
Berger, Bonnie
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (26)
[5] AutoPEFT : Automatic Configuration Search for Parameter-Efficient Fine-Tuning
Zhou, Han
Wan, Xingchen
Vulic, Ivan
Korhonen, Anna
TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 525 - 542
[6] Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning
He, Haoyu
Cai, Jianfei
Zhang, Jing
Tao, Dacheng
Zhuang, Bohan
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11791 - 11801
[7] AdapterGNN: Parameter-Efficient Fine-Tuning Improves Generalization in GNNs
Li, Shengrui
Han, Xueting
Bai, Jing
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13600 - 13608
[8] Leveraging Parameter-Efficient Fine-Tuning for Multilingual Abstractive Summarization
Shen, Jialun
Wang, Yusong
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 293 - 303
[9] Parameter-efficient fine-tuning for single image snow removal
Dai, Xinwei
Zhou, Yuanbo
Qiu, Xintao
Tang, Hui
Tong, Tong
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 265
[10] Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting
Chen, Haolin
Garner, Philip N.
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4253 - 4262

← 1 2 3 4 5 →