Gradient Sparsification For Masked Fine-Tuning of Transformers

被引:0
|
作者
O'Neill, James [1 ]
Dutta, Sourav [1 ]
机构
[1] Huawei Ireland Res Ctr, Dublin, Ireland
关键词
neural nets; sparse regularization; fine-tuning;
D O I
10.1109/IJCNN54540.2023.10191206
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-tuning pretrained self-supervised language models is widely adopted for transfer learning to downstream tasks. Fine-tuning can be achieved by freezing gradients of the pretrained network and only updating gradients of a newly added classification layer, or by performing gradient updates on all parameters. Gradual unfreezing makes a trade-off between the two by gradually unfreezing gradients of whole layers during training. This has been an effective strategy to trade-off between storage and training speed with generalization performance. However, it is not clear whether gradually unfreezing layers throughout training is optimal, compared to sparse variants of gradual unfreezing which may improve fine-tuning performance. In this paper, we propose to stochastically mask gradients to regularize pretrained language models for improving overall fine-tuned performance. We introduce GradDrop and variants thereof, a class of gradient sparsification methods that mask gradients during the backward pass, acting as gradient noise. GradDrop is sparse and stochastic unlike gradual freezing. Extensive experiments on the multilingual XGLUE benchmark with XLMR-Large show that GradDrop is competitive against methods that use additional translated data for intermediate pretraining and outperforms standard fine-tuning and gradual unfreezing. A post-analysis shows how GradDrop improves performance with languages it was not trained on, such as under-resourced languages.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Monitoring is Fine-tuning
    不详
    FLEISCHWIRTSCHAFT, 2020, 100 (11):
  • [42] Fine-tuning and consciousness
    MacShane, Denis
    TLS-THE TIMES LITERARY SUPPLEMENT, 2019, (6054): : 6 - 6
  • [43] Fine-Tuning Silencing
    Panning, Barbara
    CELL STEM CELL, 2010, 6 (01) : 3 - 4
  • [44] FINE-TUNING THE BALLAD
    DURANT, A
    KAMINOW, I
    NEW REPUBLIC, 1987, 197 (2-3) : 6 - 6
  • [45] The Fine-Tuning Argument
    Manson, Neil A.
    PHILOSOPHY COMPASS, 2009, 4 (01): : 271 - 286
  • [46] Data Fine-Tuning
    Chhabra, Saheb
    Majumdar, Puspita
    Vatsa, Mayank
    Singh, Richa
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8223 - 8230
  • [47] Fine-Tuning the Cuts
    Guthrie, James W.
    EDUCATIONAL LEADERSHIP, 2011, 69 (04) : 17 - 18
  • [48] Fine-tuning adhesives
    Miura, Grant
    NATURE CHEMICAL BIOLOGY, 2020, 16 (11) : 1153 - 1153
  • [49] Fine-Tuning the Truth
    Elder, Elisabeth E.
    Jakesz, Raimund
    WORLD JOURNAL OF SURGERY, 2011, 35 (10) : 2185 - 2186
  • [50] Fine-tuning catalysts
    Jacoby, M
    CHEMICAL & ENGINEERING NEWS, 2002, 80 (37) : 30 - 32