One is Not Enough: Parameter-Efficient Fine-Tuning With Multiplicative Sparse Factorization

被引:0
|
作者
Chen, Xuxi [1 ]
Chen, Tianlong [2 ]
Cheng, Yu [3 ]
Chen, Weizhu [4 ]
Awadallah, Ahmed Hassan [4 ]
Wang, Zhangyang [1 ]
机构
[1] UT Austin, Dept Elect & Comp Engn, Austin, TX 78705 USA
[2] Univ North Carolina Chapel Hill, Dept Comp Sci, Chapel Hill, NC 27599 USA
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[4] Microsoft Res, Redmond, WA 98052 USA
关键词
Sparse matrices; Task analysis; Matrix decomposition; Adaptation models; Tuning; Training; Optimization; Parameter-efficient fine-tuning; factorization; MATRIX FACTORIZATION;
D O I
10.1109/JSTSP.2024.3431927
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Fine-tuning gigantic pre-trained models is becoming a canonical paradigm in natural language processing. Unfortunately, as the pre-trained models grow larger, even the conventional fine-tuning becomes prohibitively resource-consuming. That motivates the recent surge of parameter-efficient fine-tuning methods by selectively updating a small portion of model parameters. Existing methods either customize add-on modules (e.g., adapter, prompter), or refer to weight parameter decomposition which relies on strong structural assumptions (e.g., sparse or low-rank updates) and ad-hoc pre-defined structure parameters (e.g., layerwise sparsities, or the intrinsic rank). Extending the latter line of work, this paper proposes a new weight structured decomposition scheme for parameter-efficient fine-tuning, that is designed to be (i) flexible, covering a much broader matrix family, with sparse or low-rank matrices as special cases; (ii) (nearly) hyperparameter-free, requiring only a global parameter budget as input. This new scheme, dubbed AutoSparse, meets the two goals by factorizing each layer's weight update into a product of multiple sparse matrix factors. Notably, the sparsity levels of all those matrices are automatically allocated (without adopting any heuristic or ad-hoc tuning), through one holistic budget-constrained optimization. It can be solved by the projected gradient descent method that can be painlessly plugged in normal fine-tuning. Extensive experiments and in-depth studies on diverse architectures/tasks like {BERT, RoBERTa, BART}, consistently endorse the superior parameter efficiency of AutoSparse to surpass state-of-the-arts. For instance, AutoSparse with BERT can operate at only 0.5% trainable parameters, while hitting an accuracy of 83.2% on MNLI-mismatched.
引用
收藏
页码:1059 / 1069
页数:11
相关论文
共 50 条
  • [1] On the Effectiveness of Parameter-Efficient Fine-Tuning
    Fu, Zihao
    Yang, Haoran
    So, Anthony Man-Cho
    Lam, Wai
    Bing, Lidong
    Collier, Nigel
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12799 - 12807
  • [2] Frozen Weights as Prior for Parameter-Efficient Fine-Tuning
    Ma, Xiaolong
    Liu, Peishun
    Gao, Haojie
    Yan, Zikang
    Ma, Ningning
    Liu, Wenqiang
    Wang, Xuefang
    Tang, Ruichun
    IEEE ACCESS, 2025, 13 : 24411 - 24425
  • [3] Parameter-Efficient Fine-Tuning without Introducing New Latency
    Liao, Baohao
    Meng, Yan
    Monz, Christof
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 4242 - 4260
  • [4] Democratizing protein language models with parameter-efficient fine-tuning
    Sledzieski, Samuel
    Kshirsagar, Meghana
    Baek, Minkyung
    Dodhia, Rahul
    Ferres, Juan Lavista
    Berger, Bonnie
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (26)
  • [5] AutoPEFT : Automatic Configuration Search for Parameter-Efficient Fine-Tuning
    Zhou, Han
    Wan, Xingchen
    Vulic, Ivan
    Korhonen, Anna
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 525 - 542
  • [6] Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning
    He, Haoyu
    Cai, Jianfei
    Zhang, Jing
    Tao, Dacheng
    Zhuang, Bohan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11791 - 11801
  • [7] AdapterGNN: Parameter-Efficient Fine-Tuning Improves Generalization in GNNs
    Li, Shengrui
    Han, Xueting
    Bai, Jing
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13600 - 13608
  • [8] Leveraging Parameter-Efficient Fine-Tuning for Multilingual Abstractive Summarization
    Shen, Jialun
    Wang, Yusong
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 293 - 303
  • [9] Parameter-efficient fine-tuning for single image snow removal
    Dai, Xinwei
    Zhou, Yuanbo
    Qiu, Xintao
    Tang, Hui
    Tong, Tong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 265
  • [10] Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting
    Chen, Haolin
    Garner, Philip N.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 4253 - 4262