Efficient Fine-Tuning of BERT Models on the Edge

被引：6

作者：

Vucetic, Danilo ^{[1
]}

Tayaranian, Mohammadreza ^{[1
]}

Ziaeefard, Maryam ^{[1
]}

Clark, James J. ^{[1
]}

Meyer, Brett H. ^{[1
]}

Gross, Warren J. ^{[1
]}

机构：

[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada

来源：

2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22) | 2022年

关键词：

Transformers; BERT; DistilBERT; NLP; Language Models; Efficient Transfer Learning; Efficient Fine-Tuning; Memory Efficiency; Time Efficiency; Edge Machine Learning;

D O I：

10.1109/ISCAS48785.2022.9937567

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Resource-constrained devices are increasingly the deployment targets of machine learning applications. Static models, however, do not always suffice for dynamic environments. On-device training of models allows for quick adaptability to new scenarios. With the increasing size of deep neural networks, as noted with the likes of BERT and other natural language processing models, comes increased resource requirements, namely memory, computation, energy, and time. Furthermore, training is far more resource intensive than inference. Resource-constrained on-device learning is thus doubly difficult, especially with large BERT-like models. By reducing the memory usage of fine-tuning, pre-trained BERT models can become efficient enough to fine-tune on resource-constrained devices. We propose Freeze And Reconfigure (FAR), a memory-efficient training regime for BERTlike models that reduces the memory usage of activation maps during fine-tuning by avoiding unnecessary parameter updates. FAR reduces fine-tuning time on the DistilBERT model and CoLA dataset by 30%, and time spent on memory operations by 47%. More broadly, reductions in metric performance on the GLUE and SQuAD datasets are around 1% on average.

引用

页码：1838 / 1842

页数：5

共 50 条

[1] Fine-Tuning BERT Models for Multiclass Amharic News Document Categorization
Endalie, Demeke
COMPLEXITY, 2025, 2025 (01)
[2] Transfer fine-tuning of BERT with phrasal paraphrases
Arase, Yuki
Tsujii, Junichi
COMPUTER SPEECH AND LANGUAGE, 2021, 66
[3] Energy and Carbon Considerations of Fine-Tuning BERT
Wang, Xiaorong
Na, Clara
Strubell, Emma
Friedler, Sorelle A.
Luccioni, Sasha
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9058 - 9069
[4] SPEECH RECOGNITION BY SIMPLY FINE-TUNING BERT
Huang, Wen-Chin
Wu, Chia-Hua
Luo, Shang-Bao
Chen, Kuan-Yu
Wang, Hsin-Min
Toda, Tomoki
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7343 - 7347
[5] Transfer Fine-Tuning: A BERT Case Study
Arase, Yuki
Tsujii, Junichi
2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5393 - 5404
[6] Investigating Learning Dynamics of BERT Fine-Tuning
Hao, Yaru
Dong, Li
Wei, Furu
Xu, Ke
1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 87 - 92
[7] How fine can fine-tuning be? Learning efficient language models
Radiya-Dixit, Evani
Wang, Xin
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2435 - 2442
[8] Dataset Distillation with Attention Labels for Fine-tuning BERT
Maekawa, Aru
Kobayashi, Naoki
Funakoshi, Kotaro
Okumura, Manabu
61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 119 - 127
[9] Fine-Tuning BERT for Generative Dialogue Domain Adaptation
Labruna, Tiziano
Magnini, Bernardo
TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 513 - 524
[10] Patent classification by fine-tuning BERT language model
Lee, Jieh-Sheng
Hsiang, Jieh
WORLD PATENT INFORMATION, 2020, 61

← 1 2 3 4 5 →