Layer-Wise Learning Rate Optimization for Task-Dependent Fine-Tuning of Pre-Trained Models: An Evolutionary Approach

被引:0
|
作者
Bu, Chenyang [1 ]
Liu, Yuxin [1 ]
Huang, Manzong [1 ]
Shao, Jianxuan [1 ]
Ji, Shengwei [2 ]
Luo, Wenjian [3 ]
Wu, Xindong [1 ]
机构
[1] Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education and School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China
[2] School of Artificial Intelligence and Big Data, Hefei University, Hefei, China
[3] Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
基金
中国国家自然科学基金;
关键词
Benchmarking - Contrastive Learning - Electric transformers - Transfer learning;
D O I
10.1145/3689827
中图分类号
学科分类号
摘要
The superior performance of large-scale pre-Trained models, such as Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-Trained Transformer (GPT), has received increasing attention in both academic and industrial research and has become one of the current research hotspots. A pre-Trained model refers to a model trained on large-scale unlabeled data, whose purpose is to learn general language representation or features for fine-Tuning or transfer learning in subsequent tasks. After pre-Training is complete, a small amount of labeled data can be used to fine-Tune the model for a specific task or domain. This two-stage method of pre-Training+fine-Tuninghas achieved advanced results in natural language processing (NLP) tasks. Despite widespread adoption, existing fixed fine-Tuning schemes that adapt well to one NLP task may perform inconsistently on other NLP tasks given that different tasks have different latent semantic structures. In this article, we explore the effectiveness of automatic fine-Tuning pattern search for layer-wise learning rates from an evolutionary optimization perspective. Our goal is to use evolutionary algorithms to search for better task-dependent fine-Tuning patterns for specific NLP tasks than typical fixed fine-Tuning patterns. Experimental results on two real-world language benchmarks and three advanced pre-Training language models show the effectiveness and generality of the proposed framework. © 2024 held by the owner/author(s).
引用
收藏
相关论文
共 50 条
  • [1] Span Fine-tuning for Pre-trained Language Models
    Bao, Rongzhou
    Zhang, Zhuosheng
    Zhao, Hai
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1970 - 1979
  • [2] An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models
    Liu, Xueqing
    Wang, Chi
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2286 - 2300
  • [3] Debiasing Pre-Trained Language Models via Efficient Fine-Tuning
    Gira, Michael
    Zhang, Ruisu
    Lee, Kangwook
    [J]. PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 59 - 69
  • [4] Exploiting Syntactic Information to Boost the Fine-tuning of Pre-trained Models
    Liu, Chaoming
    Zhu, Wenhao
    Zhang, Xiaoyu
    Zhai, Qiuhong
    [J]. 2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 575 - 582
  • [5] Pathologies of Pre-trained Language Models in Few-shot Fine-tuning
    Chen, Hanjie
    Zheng, Guoqing
    Awadallah, Ahmed Hassan
    Ji, Yangfeng
    [J]. PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 144 - 153
  • [6] Fine-tuning the hyperparameters of pre-trained models for solving multiclass classification problems
    Kaibassova, D.
    Nurtay, M.
    Tau, A.
    Kissina, M.
    [J]. COMPUTER OPTICS, 2022, 46 (06) : 971 - 979
  • [7] Revisiting k-NN for Fine-Tuning Pre-trained Language Models
    Li, Lei
    Chen, Jing
    Tian, Botzhong
    Zhang, Ningyu
    [J]. CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 327 - 338
  • [8] Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively
    Zhang, Haojie
    Li, Ge
    Li, Jia
    Zhang, Zhongjin
    Zhu, Yuqi
    Jin, Zhi
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [9] When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning
    Weller, Orion
    Seppi, Kevin
    Gardner, Matt
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 272 - 282
  • [10] AutoLR: Layer-wise Pruning and Auto-tuning of Learning Rates in Fine-tuning of Deep Networks
    Ro, Younmgin
    Choi, Jin Young
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2486 - 2494