Efficient Fine-Tuning of BERT Models on the Edge

被引:6
|
作者
Vucetic, Danilo [1 ]
Tayaranian, Mohammadreza [1 ]
Ziaeefard, Maryam [1 ]
Clark, James J. [1 ]
Meyer, Brett H. [1 ]
Gross, Warren J. [1 ]
机构
[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada
关键词
Transformers; BERT; DistilBERT; NLP; Language Models; Efficient Transfer Learning; Efficient Fine-Tuning; Memory Efficiency; Time Efficiency; Edge Machine Learning;
D O I
10.1109/ISCAS48785.2022.9937567
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Resource-constrained devices are increasingly the deployment targets of machine learning applications. Static models, however, do not always suffice for dynamic environments. On-device training of models allows for quick adaptability to new scenarios. With the increasing size of deep neural networks, as noted with the likes of BERT and other natural language processing models, comes increased resource requirements, namely memory, computation, energy, and time. Furthermore, training is far more resource intensive than inference. Resource-constrained on-device learning is thus doubly difficult, especially with large BERT-like models. By reducing the memory usage of fine-tuning, pre-trained BERT models can become efficient enough to fine-tune on resource-constrained devices. We propose Freeze And Reconfigure (FAR), a memory-efficient training regime for BERTlike models that reduces the memory usage of activation maps during fine-tuning by avoiding unnecessary parameter updates. FAR reduces fine-tuning time on the DistilBERT model and CoLA dataset by 30%, and time spent on memory operations by 47%. More broadly, reductions in metric performance on the GLUE and SQuAD datasets are around 1% on average.
引用
收藏
页码:1838 / 1842
页数:5
相关论文
共 50 条
  • [1] Fine-Tuning BERT Models for Multiclass Amharic News Document Categorization
    Endalie, Demeke
    COMPLEXITY, 2025, 2025 (01)
  • [2] Transfer fine-tuning of BERT with phrasal paraphrases
    Arase, Yuki
    Tsujii, Junichi
    COMPUTER SPEECH AND LANGUAGE, 2021, 66
  • [3] Energy and Carbon Considerations of Fine-Tuning BERT
    Wang, Xiaorong
    Na, Clara
    Strubell, Emma
    Friedler, Sorelle A.
    Luccioni, Sasha
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9058 - 9069
  • [4] SPEECH RECOGNITION BY SIMPLY FINE-TUNING BERT
    Huang, Wen-Chin
    Wu, Chia-Hua
    Luo, Shang-Bao
    Chen, Kuan-Yu
    Wang, Hsin-Min
    Toda, Tomoki
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7343 - 7347
  • [5] Transfer Fine-Tuning: A BERT Case Study
    Arase, Yuki
    Tsujii, Junichi
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5393 - 5404
  • [6] Investigating Learning Dynamics of BERT Fine-Tuning
    Hao, Yaru
    Dong, Li
    Wei, Furu
    Xu, Ke
    1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 87 - 92
  • [7] How fine can fine-tuning be? Learning efficient language models
    Radiya-Dixit, Evani
    Wang, Xin
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2435 - 2442
  • [8] Dataset Distillation with Attention Labels for Fine-tuning BERT
    Maekawa, Aru
    Kobayashi, Naoki
    Funakoshi, Kotaro
    Okumura, Manabu
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 119 - 127
  • [9] Fine-Tuning BERT for Generative Dialogue Domain Adaptation
    Labruna, Tiziano
    Magnini, Bernardo
    TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 513 - 524
  • [10] Patent classification by fine-tuning BERT language model
    Lee, Jieh-Sheng
    Hsiang, Jieh
    WORLD PATENT INFORMATION, 2020, 61