One is Not Enough: Parameter-Efficient Fine-Tuning With Multiplicative Sparse Factorization

被引：0

作者：

Chen, Xuxi ^{[1
]}

Chen, Tianlong ^{[2
]}

Cheng, Yu ^{[3
]}

Chen, Weizhu ^{[4
]}

Awadallah, Ahmed Hassan ^{[4
]}

Wang, Zhangyang ^{[1
]}

机构：

[1] UT Austin, Dept Elect & Comp Engn, Austin, TX 78705 USA

[2] Univ North Carolina Chapel Hill, Dept Comp Sci, Chapel Hill, NC 27599 USA

[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[4] Microsoft Res, Redmond, WA 98052 USA

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2024年 / 18卷 / 06期

关键词：

Sparse matrices; Task analysis; Matrix decomposition; Adaptation models; Tuning; Training; Optimization; Parameter-efficient fine-tuning; factorization; MATRIX FACTORIZATION;

D O I：

10.1109/JSTSP.2024.3431927

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Fine-tuning gigantic pre-trained models is becoming a canonical paradigm in natural language processing. Unfortunately, as the pre-trained models grow larger, even the conventional fine-tuning becomes prohibitively resource-consuming. That motivates the recent surge of parameter-efficient fine-tuning methods by selectively updating a small portion of model parameters. Existing methods either customize add-on modules (e.g., adapter, prompter), or refer to weight parameter decomposition which relies on strong structural assumptions (e.g., sparse or low-rank updates) and ad-hoc pre-defined structure parameters (e.g., layerwise sparsities, or the intrinsic rank). Extending the latter line of work, this paper proposes a new weight structured decomposition scheme for parameter-efficient fine-tuning, that is designed to be (i) flexible, covering a much broader matrix family, with sparse or low-rank matrices as special cases; (ii) (nearly) hyperparameter-free, requiring only a global parameter budget as input. This new scheme, dubbed AutoSparse, meets the two goals by factorizing each layer's weight update into a product of multiple sparse matrix factors. Notably, the sparsity levels of all those matrices are automatically allocated (without adopting any heuristic or ad-hoc tuning), through one holistic budget-constrained optimization. It can be solved by the projected gradient descent method that can be painlessly plugged in normal fine-tuning. Extensive experiments and in-depth studies on diverse architectures/tasks like {BERT, RoBERTa, BART}, consistently endorse the superior parameter efficiency of AutoSparse to surpass state-of-the-arts. For instance, AutoSparse with BERT can operate at only 0.5% trainable parameters, while hitting an accuracy of 83.2% on MNLI-mismatched.

引用

页码：1059 / 1069

页数：11

共 50 条

[21] Parameter-Efficient Fine-Tuning of Large Pretrained Models for Instance Segmentation Tasks
Baker, Nermeen Abou
Rohrschneider, David
Handmann, Uwe
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (04): : 2783 - 2807
[22] SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
Henry Hengyuan Zhao
Pichao Wang
Yuyang Zhao
Hao Luo
Fan Wang
Mike Zheng Shou
International Journal of Computer Vision, 2024, 132 : 731 - 749
[23] Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks
Mahabadi, Rabeeh Karimi
Ruder, Sebastian
Dehghani, Mostafa
Henderson, James
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 565 - 576
[24] Data race detection via few-shot parameter-efficient fine-tuning
Shen, Yuanyuan
Peng, Manman
Zhang, Fan
Wu, Qiang
JOURNAL OF SYSTEMS AND SOFTWARE, 2025, 222
[25] Generalized Kronecker-based Adapters for Parameter-efficient Fine-tuning of Vision Transformers
Edalati, Ali
Hameed, Marawan Gamal Abdel
Mosleh, Ali
2023 20TH CONFERENCE ON ROBOTS AND VISION, CRV, 2023, : 97 - 104
[26] Structure-Aware Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
Hu, Yahao
Xie, Yifei
Wang, Tianfeng
Chen, Man
Pan, Zhisong
MATHEMATICS, 2023, 11 (20)
[27] UPetu: A Unified Parameter-Efficient Fine-Tuning Framework for Remote Sensing Foundation Model
Dong, Zhe
Gu, Yanfeng
Liu, Tianzhu
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 13
[28] LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning
Zhang, Mingyang
Chen, Hao
Shen, Chunhua
Yang, Zhen
Ou, Linlin
Yu, Xinyi
Zhuang, Bohan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 3013 - 3026
[29] A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Method-Level Code Smell Detection
School of Computer Science, Wuhan University, Wuhan, China
不详
不详
不详
不详
arXiv, 1600,
[30] Know Where You're Going: Meta-Learning for Parameter-Efficient Fine-Tuning
Gheini, Mozhdeh
Ma, Xuezhe
May, Jonathan
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11602 - 11612

← 1 2 3 4 5 →