One is Not Enough: Parameter-Efficient Fine-Tuning With Multiplicative Sparse Factorization

被引:0
|
作者
Chen, Xuxi [1 ]
Chen, Tianlong [2 ]
Cheng, Yu [3 ]
Chen, Weizhu [4 ]
Awadallah, Ahmed Hassan [4 ]
Wang, Zhangyang [1 ]
机构
[1] UT Austin, Dept Elect & Comp Engn, Austin, TX 78705 USA
[2] Univ North Carolina Chapel Hill, Dept Comp Sci, Chapel Hill, NC 27599 USA
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[4] Microsoft Res, Redmond, WA 98052 USA
关键词
Sparse matrices; Task analysis; Matrix decomposition; Adaptation models; Tuning; Training; Optimization; Parameter-efficient fine-tuning; factorization; MATRIX FACTORIZATION;
D O I
10.1109/JSTSP.2024.3431927
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Fine-tuning gigantic pre-trained models is becoming a canonical paradigm in natural language processing. Unfortunately, as the pre-trained models grow larger, even the conventional fine-tuning becomes prohibitively resource-consuming. That motivates the recent surge of parameter-efficient fine-tuning methods by selectively updating a small portion of model parameters. Existing methods either customize add-on modules (e.g., adapter, prompter), or refer to weight parameter decomposition which relies on strong structural assumptions (e.g., sparse or low-rank updates) and ad-hoc pre-defined structure parameters (e.g., layerwise sparsities, or the intrinsic rank). Extending the latter line of work, this paper proposes a new weight structured decomposition scheme for parameter-efficient fine-tuning, that is designed to be (i) flexible, covering a much broader matrix family, with sparse or low-rank matrices as special cases; (ii) (nearly) hyperparameter-free, requiring only a global parameter budget as input. This new scheme, dubbed AutoSparse, meets the two goals by factorizing each layer's weight update into a product of multiple sparse matrix factors. Notably, the sparsity levels of all those matrices are automatically allocated (without adopting any heuristic or ad-hoc tuning), through one holistic budget-constrained optimization. It can be solved by the projected gradient descent method that can be painlessly plugged in normal fine-tuning. Extensive experiments and in-depth studies on diverse architectures/tasks like {BERT, RoBERTa, BART}, consistently endorse the superior parameter efficiency of AutoSparse to surpass state-of-the-arts. For instance, AutoSparse with BERT can operate at only 0.5% trainable parameters, while hitting an accuracy of 83.2% on MNLI-mismatched.
引用
收藏
页码:1059 / 1069
页数:11
相关论文
共 50 条
  • [31] CPMI-ChatGLM: parameter-efficient fine-tuning ChatGLM with Chinese patent medicine instructions
    Liu, Can
    Sun, Kaijie
    Zhou, Qingqing
    Duan, Yuchen
    Shu, Jianhua
    Kan, Hongxing
    Gu, Zongyun
    Hu, Jili
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [32] Parameter-efficient fine-tuning of large-scale pre-trained language models
    Ning Ding
    Yujia Qin
    Guang Yang
    Fuchao Wei
    Zonghan Yang
    Yusheng Su
    Shengding Hu
    Yulin Chen
    Chi-Min Chan
    Weize Chen
    Jing Yi
    Weilin Zhao
    Xiaozhi Wang
    Zhiyuan Liu
    Hai-Tao Zheng
    Jianfei Chen
    Yang Liu
    Jie Tang
    Juanzi Li
    Maosong Sun
    Nature Machine Intelligence, 2023, 5 : 220 - 235
  • [33] LLM-Adapters: An Adapter Family for Parameter-Efficient Fine-Tuning of Large Language Models
    Hu, Zhiqiang
    Wang, Lei
    Lan, Yihuai
    Xu, Wanyu
    Lim, Ee-Peng
    Bing, Lidong
    Xu, Xing
    Poria, Soujanya
    Lee, Roy Ka-Wei
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5254 - 5276
  • [34] DiffFit: Unlocking Transferability of Large Diffusion Models via Simple Parameter-Efficient Fine-Tuning
    Xie, Enze
    Yao, Lewei
    Shi, Han
    Liu, Zhili
    Zhou, Daquan
    Liu, Zhaoqiang
    Li, Jiawei
    Li, Zhenguo
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 4207 - 4216
  • [35] Sparsity- and Hybridity-Inspired Visual Parameter-Efficient Fine-Tuning for Medical Diagnosis
    Liu, Mingyuan
    Xu, Lu
    Liu, Shengnan
    Zhang, Jicong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 627 - 637
  • [36] Transferrable DP-Adapter Tuning: A Privacy-Preserving Multimodal Parameter-Efficient Fine-Tuning Framework
    Ji, Lixia
    Xiao, Shijie
    Xu, Bingzhi
    Zhang, Han
    2024 IEEE 24TH INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS, 2024, : 471 - 482
  • [37] An Empirical Study of Parameter-Efficient Fine-Tuning Methods for Pre-trained Code Models
    Liu, Jiaxing
    Sha, Chaofeng
    Peng, Xin
    2023 38TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING, ASE, 2023, : 397 - 408
  • [38] Parameter-efficient fine-tuning large language model approach for hospital discharge paper summarization
    Goswami, Joyeeta
    Prajapati, Kaushal Kumar
    Saha, Ashim
    Saha, Apu Kumar
    APPLIED SOFT COMPUTING, 2024, 157
  • [39] Parameter-efficient fine-tuning of large-scale pre-trained language models
    Ding, Ning
    Qin, Yujia
    Yang, Guang
    Wei, Fuchao
    Yang, Zonghan
    Su, Yusheng
    Hu, Shengding
    Chen, Yulin
    Chan, Chi-Min
    Chen, Weize
    Yi, Jing
    Zhao, Weilin
    Wang, Xiaozhi
    Liu, Zhiyuan
    Zheng, Hai-Tao
    Chen, Jianfei
    Liu, Yang
    Tang, Jie
    Li, Juanzi
    Sun, Maosong
    NATURE MACHINE INTELLIGENCE, 2023, 5 (03) : 220 - +
  • [40] PockEngine: Sparse and Efficient Fine-tuning in a Pocket
    Zhu, Ligeng
    Hu, Lanxiang
    Lin, Ji
    Wang, Wei-Chen
    Chen, Wei-Ming
    Gan, Chuang
    Han, Song
    56TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2023, 2023, : 1381 - 1394