One is Not Enough: Parameter-Efficient Fine-Tuning With Multiplicative Sparse Factorization

被引:0
|
作者
Chen, Xuxi [1 ]
Chen, Tianlong [2 ]
Cheng, Yu [3 ]
Chen, Weizhu [4 ]
Awadallah, Ahmed Hassan [4 ]
Wang, Zhangyang [1 ]
机构
[1] UT Austin, Dept Elect & Comp Engn, Austin, TX 78705 USA
[2] Univ North Carolina Chapel Hill, Dept Comp Sci, Chapel Hill, NC 27599 USA
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[4] Microsoft Res, Redmond, WA 98052 USA
关键词
Sparse matrices; Task analysis; Matrix decomposition; Adaptation models; Tuning; Training; Optimization; Parameter-efficient fine-tuning; factorization; MATRIX FACTORIZATION;
D O I
10.1109/JSTSP.2024.3431927
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Fine-tuning gigantic pre-trained models is becoming a canonical paradigm in natural language processing. Unfortunately, as the pre-trained models grow larger, even the conventional fine-tuning becomes prohibitively resource-consuming. That motivates the recent surge of parameter-efficient fine-tuning methods by selectively updating a small portion of model parameters. Existing methods either customize add-on modules (e.g., adapter, prompter), or refer to weight parameter decomposition which relies on strong structural assumptions (e.g., sparse or low-rank updates) and ad-hoc pre-defined structure parameters (e.g., layerwise sparsities, or the intrinsic rank). Extending the latter line of work, this paper proposes a new weight structured decomposition scheme for parameter-efficient fine-tuning, that is designed to be (i) flexible, covering a much broader matrix family, with sparse or low-rank matrices as special cases; (ii) (nearly) hyperparameter-free, requiring only a global parameter budget as input. This new scheme, dubbed AutoSparse, meets the two goals by factorizing each layer's weight update into a product of multiple sparse matrix factors. Notably, the sparsity levels of all those matrices are automatically allocated (without adopting any heuristic or ad-hoc tuning), through one holistic budget-constrained optimization. It can be solved by the projected gradient descent method that can be painlessly plugged in normal fine-tuning. Extensive experiments and in-depth studies on diverse architectures/tasks like {BERT, RoBERTa, BART}, consistently endorse the superior parameter efficiency of AutoSparse to surpass state-of-the-arts. For instance, AutoSparse with BERT can operate at only 0.5% trainable parameters, while hitting an accuracy of 83.2% on MNLI-mismatched.
引用
收藏
页码:1059 / 1069
页数:11
相关论文
共 50 条
  • [21] Parameter-Efficient Fine-Tuning of Large Pretrained Models for Instance Segmentation Tasks
    Baker, Nermeen Abou
    Rohrschneider, David
    Handmann, Uwe
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2024, 6 (04): : 2783 - 2807
  • [22] SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels
    Henry Hengyuan Zhao
    Pichao Wang
    Yuyang Zhao
    Hao Luo
    Fan Wang
    Mike Zheng Shou
    International Journal of Computer Vision, 2024, 132 : 731 - 749
  • [23] Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks
    Mahabadi, Rabeeh Karimi
    Ruder, Sebastian
    Dehghani, Mostafa
    Henderson, James
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 565 - 576
  • [24] Data race detection via few-shot parameter-efficient fine-tuning
    Shen, Yuanyuan
    Peng, Manman
    Zhang, Fan
    Wu, Qiang
    JOURNAL OF SYSTEMS AND SOFTWARE, 2025, 222
  • [25] Generalized Kronecker-based Adapters for Parameter-efficient Fine-tuning of Vision Transformers
    Edalati, Ali
    Hameed, Marawan Gamal Abdel
    Mosleh, Ali
    2023 20TH CONFERENCE ON ROBOTS AND VISION, CRV, 2023, : 97 - 104
  • [26] Structure-Aware Low-Rank Adaptation for Parameter-Efficient Fine-Tuning
    Hu, Yahao
    Xie, Yifei
    Wang, Tianfeng
    Chen, Man
    Pan, Zhisong
    MATHEMATICS, 2023, 11 (20)
  • [27] UPetu: A Unified Parameter-Efficient Fine-Tuning Framework for Remote Sensing Foundation Model
    Dong, Zhe
    Gu, Yanfeng
    Liu, Tianzhu
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 13
  • [28] LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning
    Zhang, Mingyang
    Chen, Hao
    Shen, Chunhua
    Yang, Zhen
    Ou, Linlin
    Yu, Xinyi
    Zhuang, Bohan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 3013 - 3026
  • [29] A Comprehensive Evaluation of Parameter-Efficient Fine-Tuning on Method-Level Code Smell Detection
    School of Computer Science, Wuhan University, Wuhan, China
    不详
    不详
    不详
    不详
    arXiv, 1600,
  • [30] Know Where You're Going: Meta-Learning for Parameter-Efficient Fine-Tuning
    Gheini, Mozhdeh
    Ma, Xuezhe
    May, Jonathan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11602 - 11612