One is Not Enough: Parameter-Efficient Fine-Tuning With Multiplicative Sparse Factorization

被引：0

作者：

Chen, Xuxi ^{[1
]}

Chen, Tianlong ^{[2
]}

Cheng, Yu ^{[3
]}

Chen, Weizhu ^{[4
]}

Awadallah, Ahmed Hassan ^{[4
]}

Wang, Zhangyang ^{[1
]}

机构：

[1] UT Austin, Dept Elect & Comp Engn, Austin, TX 78705 USA

[2] Univ North Carolina Chapel Hill, Dept Comp Sci, Chapel Hill, NC 27599 USA

[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[4] Microsoft Res, Redmond, WA 98052 USA

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2024年 / 18卷 / 06期

关键词：

Sparse matrices; Task analysis; Matrix decomposition; Adaptation models; Tuning; Training; Optimization; Parameter-efficient fine-tuning; factorization; MATRIX FACTORIZATION;

D O I：

10.1109/JSTSP.2024.3431927

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Fine-tuning gigantic pre-trained models is becoming a canonical paradigm in natural language processing. Unfortunately, as the pre-trained models grow larger, even the conventional fine-tuning becomes prohibitively resource-consuming. That motivates the recent surge of parameter-efficient fine-tuning methods by selectively updating a small portion of model parameters. Existing methods either customize add-on modules (e.g., adapter, prompter), or refer to weight parameter decomposition which relies on strong structural assumptions (e.g., sparse or low-rank updates) and ad-hoc pre-defined structure parameters (e.g., layerwise sparsities, or the intrinsic rank). Extending the latter line of work, this paper proposes a new weight structured decomposition scheme for parameter-efficient fine-tuning, that is designed to be (i) flexible, covering a much broader matrix family, with sparse or low-rank matrices as special cases; (ii) (nearly) hyperparameter-free, requiring only a global parameter budget as input. This new scheme, dubbed AutoSparse, meets the two goals by factorizing each layer's weight update into a product of multiple sparse matrix factors. Notably, the sparsity levels of all those matrices are automatically allocated (without adopting any heuristic or ad-hoc tuning), through one holistic budget-constrained optimization. It can be solved by the projected gradient descent method that can be painlessly plugged in normal fine-tuning. Extensive experiments and in-depth studies on diverse architectures/tasks like {BERT, RoBERTa, BART}, consistently endorse the superior parameter efficiency of AutoSparse to surpass state-of-the-arts. For instance, AutoSparse with BERT can operate at only 0.5% trainable parameters, while hitting an accuracy of 83.2% on MNLI-mismatched.

引用

页码：1059 / 1069

页数：11

共 50 条

[41] Parameterizing Context: Unleashing the Power of Parameter-Efficient Fine-Tuning and In-Context Tuning for Continual Table Semantic Parsing
Chen, Yongrui
Zhang, Shenyu
Qi, Guilin
Guo, Xinnan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[42] Token Embeddings Augmentation benefits Parameter-Efficient Fine-Tuning under long-tailed distribution
Wang, Weiqiu
Chen, Zining
Zhao, Zhicheng
Su, Fei
NEUROCOMPUTING, 2025, 615
[43] BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
Ben-Zaken, Elad
Ravfogel, Shauli
Goldberg, Yoav
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 1 - 9
[44] Parameter-Efficient Fine-Tuning of Pre-trained Large Language Models for Financial Text Analysis
Langa, Kelly
Wang, Hairong
Okuboyejo, Olaperi
ARTIFICIAL INTELLIGENCE RESEARCH, SACAIR 2024, 2025, 2326 : 3 - 20
[45] Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models
Lawton, Neal
Kumar, Anoop
Thattai, Govind
Galstyan, Aram
Ver Steeg, Greg
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8506 - 8515
[46] Towards Foundation Models and Few-Shot Parameter-Efficient Fine-Tuning for Volumetric Organ Segmentation
Silva-Rodriguez, Julio
Dolz, Jose
Ben Ayed, Ismail
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023 WORKSHOPS, 2023, 14393 : 213 - 224
[47] Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
Liu, Haokun
Tam, Derek
Muqeeth, Mohammed
Mohta, Jay
Huang, Tenghao
Raffel, Mohit Bansal Colin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[48] Reparameterization-Based Parameter-Efficient Fine-Tuning Methods for Large Language Models: A Systematic Survey
Chen, Zezhou
Liu, Zhaoxiang
Wang, Kai
Lian, Shiguo
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 107 - 118
[49] Enhancing text understanding of decoder-based model by leveraging parameter-efficient fine-tuning method
Feroze, Wasif
Cheng, Shaohuan
Jimale, Elias Lemuye
Jakhro, Abdul Naveed
Qu, Hong
Neural Computing and Applications, 2025, 37 (09) : 6899 - 6913
[50] Trans-SAM: Transfer Segment Anything Model to medical image segmentation with Parameter-Efficient Fine-Tuning
Wu, Yanlin
Wang, Zhihong
Yang, Xiongfeng
Kang, Hong
He, Along
Li, Tao
KNOWLEDGE-BASED SYSTEMS, 2025, 310

← 1 2 3 4 5 →