One is Not Enough: Parameter-Efficient Fine-Tuning With Multiplicative Sparse Factorization

被引:0
|
作者
Chen, Xuxi [1 ]
Chen, Tianlong [2 ]
Cheng, Yu [3 ]
Chen, Weizhu [4 ]
Awadallah, Ahmed Hassan [4 ]
Wang, Zhangyang [1 ]
机构
[1] UT Austin, Dept Elect & Comp Engn, Austin, TX 78705 USA
[2] Univ North Carolina Chapel Hill, Dept Comp Sci, Chapel Hill, NC 27599 USA
[3] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[4] Microsoft Res, Redmond, WA 98052 USA
关键词
Sparse matrices; Task analysis; Matrix decomposition; Adaptation models; Tuning; Training; Optimization; Parameter-efficient fine-tuning; factorization; MATRIX FACTORIZATION;
D O I
10.1109/JSTSP.2024.3431927
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Fine-tuning gigantic pre-trained models is becoming a canonical paradigm in natural language processing. Unfortunately, as the pre-trained models grow larger, even the conventional fine-tuning becomes prohibitively resource-consuming. That motivates the recent surge of parameter-efficient fine-tuning methods by selectively updating a small portion of model parameters. Existing methods either customize add-on modules (e.g., adapter, prompter), or refer to weight parameter decomposition which relies on strong structural assumptions (e.g., sparse or low-rank updates) and ad-hoc pre-defined structure parameters (e.g., layerwise sparsities, or the intrinsic rank). Extending the latter line of work, this paper proposes a new weight structured decomposition scheme for parameter-efficient fine-tuning, that is designed to be (i) flexible, covering a much broader matrix family, with sparse or low-rank matrices as special cases; (ii) (nearly) hyperparameter-free, requiring only a global parameter budget as input. This new scheme, dubbed AutoSparse, meets the two goals by factorizing each layer's weight update into a product of multiple sparse matrix factors. Notably, the sparsity levels of all those matrices are automatically allocated (without adopting any heuristic or ad-hoc tuning), through one holistic budget-constrained optimization. It can be solved by the projected gradient descent method that can be painlessly plugged in normal fine-tuning. Extensive experiments and in-depth studies on diverse architectures/tasks like {BERT, RoBERTa, BART}, consistently endorse the superior parameter efficiency of AutoSparse to surpass state-of-the-arts. For instance, AutoSparse with BERT can operate at only 0.5% trainable parameters, while hitting an accuracy of 83.2% on MNLI-mismatched.
引用
收藏
页码:1059 / 1069
页数:11
相关论文
共 50 条
  • [41] Parameterizing Context: Unleashing the Power of Parameter-Efficient Fine-Tuning and In-Context Tuning for Continual Table Semantic Parsing
    Chen, Yongrui
    Zhang, Shenyu
    Qi, Guilin
    Guo, Xinnan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] Token Embeddings Augmentation benefits Parameter-Efficient Fine-Tuning under long-tailed distribution
    Wang, Weiqiu
    Chen, Zining
    Zhao, Zhicheng
    Su, Fei
    NEUROCOMPUTING, 2025, 615
  • [43] BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
    Ben-Zaken, Elad
    Ravfogel, Shauli
    Goldberg, Yoav
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 1 - 9
  • [44] Parameter-Efficient Fine-Tuning of Pre-trained Large Language Models for Financial Text Analysis
    Langa, Kelly
    Wang, Hairong
    Okuboyejo, Olaperi
    ARTIFICIAL INTELLIGENCE RESEARCH, SACAIR 2024, 2025, 2326 : 3 - 20
  • [45] Neural Architecture Search for Parameter-Efficient Fine-tuning of Large Pre-trained Language Models
    Lawton, Neal
    Kumar, Anoop
    Thattai, Govind
    Galstyan, Aram
    Ver Steeg, Greg
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8506 - 8515
  • [46] Towards Foundation Models and Few-Shot Parameter-Efficient Fine-Tuning for Volumetric Organ Segmentation
    Silva-Rodriguez, Julio
    Dolz, Jose
    Ben Ayed, Ismail
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2023 WORKSHOPS, 2023, 14393 : 213 - 224
  • [47] Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
    Liu, Haokun
    Tam, Derek
    Muqeeth, Mohammed
    Mohta, Jay
    Huang, Tenghao
    Raffel, Mohit Bansal Colin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [48] Reparameterization-Based Parameter-Efficient Fine-Tuning Methods for Large Language Models: A Systematic Survey
    Chen, Zezhou
    Liu, Zhaoxiang
    Wang, Kai
    Lian, Shiguo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 107 - 118
  • [49] Enhancing text understanding of decoder-based model by leveraging parameter-efficient fine-tuning method
    Feroze, Wasif
    Cheng, Shaohuan
    Jimale, Elias Lemuye
    Jakhro, Abdul Naveed
    Qu, Hong
    Neural Computing and Applications, 2025, 37 (09) : 6899 - 6913
  • [50] Trans-SAM: Transfer Segment Anything Model to medical image segmentation with Parameter-Efficient Fine-Tuning
    Wu, Yanlin
    Wang, Zhihong
    Yang, Xiongfeng
    Kang, Hong
    He, Along
    Li, Tao
    KNOWLEDGE-BASED SYSTEMS, 2025, 310