Efficient Fine-Tuning of BERT Models on the Edge

被引:6
|
作者
Vucetic, Danilo [1 ]
Tayaranian, Mohammadreza [1 ]
Ziaeefard, Maryam [1 ]
Clark, James J. [1 ]
Meyer, Brett H. [1 ]
Gross, Warren J. [1 ]
机构
[1] McGill Univ, Dept Elect & Comp Engn, Montreal, PQ, Canada
关键词
Transformers; BERT; DistilBERT; NLP; Language Models; Efficient Transfer Learning; Efficient Fine-Tuning; Memory Efficiency; Time Efficiency; Edge Machine Learning;
D O I
10.1109/ISCAS48785.2022.9937567
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Resource-constrained devices are increasingly the deployment targets of machine learning applications. Static models, however, do not always suffice for dynamic environments. On-device training of models allows for quick adaptability to new scenarios. With the increasing size of deep neural networks, as noted with the likes of BERT and other natural language processing models, comes increased resource requirements, namely memory, computation, energy, and time. Furthermore, training is far more resource intensive than inference. Resource-constrained on-device learning is thus doubly difficult, especially with large BERT-like models. By reducing the memory usage of fine-tuning, pre-trained BERT models can become efficient enough to fine-tune on resource-constrained devices. We propose Freeze And Reconfigure (FAR), a memory-efficient training regime for BERTlike models that reduces the memory usage of activation maps during fine-tuning by avoiding unnecessary parameter updates. FAR reduces fine-tuning time on the DistilBERT model and CoLA dataset by 30%, and time spent on memory operations by 47%. More broadly, reductions in metric performance on the GLUE and SQuAD datasets are around 1% on average.
引用
收藏
页码:1838 / 1842
页数:5
相关论文
共 50 条
  • [21] Fine-Tuning BERT on Twitter and Reddit Data in Luganda and English
    Kimera, Richard
    Rim, Daniela N.
    Choi, Heeyoul
    PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023, 2023, : 63 - 70
  • [22] Dual-Objective Fine-Tuning of BERT for Entity Matching
    Peeters, Ralph
    Bizer, Christian
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2021, 14 (10): : 1913 - 1921
  • [23] Fine-Tuning BERT-Based Pre-Trained Models for Arabic Dependency Parsing
    Al-Ghamdi, Sharefah
    Al-Khalifa, Hend
    Al-Salman, Abdulmalik
    APPLIED SCIENCES-BASEL, 2023, 13 (07):
  • [24] LLAMAFACTORY: Unified Efficient Fine-Tuning of 100+Language Models
    Zheng, Yaowei
    Zhang, Richong
    Zhang, Junhao
    Ye, Yanhan
    Luo, Zheyan
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 3: SYSTEM DEMONSTRATIONS, 2024, : 400 - 410
  • [25] Democratizing protein language models with parameter-efficient fine-tuning
    Sledzieski, Samuel
    Kshirsagar, Meghana
    Baek, Minkyung
    Dodhia, Rahul
    Ferres, Juan Lavista
    Berger, Bonnie
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2024, 121 (26)
  • [26] Efficient and Versatile Robust Fine-Tuning of Zero-Shot Models
    Kim, Sungyeon
    Jeong, Boseung
    Kim, Donghyun
    Kwak, Suha
    COMPUTER VISION - ECCV 2024, PT XVII, 2025, 15075 : 440 - 458
  • [27] Distill or Annotate? Cost-Efficient Fine-Tuning of Compact Models
    Kang, Junmo
    Xu, Wei
    Ritter, Alan
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11100 - 11119
  • [28] Federated Fine-Tuning Performance on Edge Devices
    Orescanin, Marko
    Ergezer, Mehmet
    Singh, Gurminder
    Baxter, Matthew
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 1174 - 1181
  • [29] Fine-tuning constraints on supergravity models
    Bastero-Gil, M
    Kane, GL
    King, SF
    PHYSICS LETTERS B, 2000, 474 (1-2) : 103 - 112
  • [30] A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension
    Cai, Jie
    Zhu, Zhengzhou
    Nie, Ping
    Liu, Qian
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 1665 - 1668