Layer-Wise Learning Rate Optimization for Task-Dependent Fine-Tuning of Pre-Trained Models: An Evolutionary Approach

被引：0

作者：

Bu, Chenyang ^{[1
]}

Liu, Yuxin ^{[1
]}

Huang, Manzong ^{[1
]}

Shao, Jianxuan ^{[1
]}

Ji, Shengwei ^{[2
]}

Luo, Wenjian ^{[3
]}

Wu, Xindong ^{[1
]}

机构：

[1] Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education and School of Computer Science and Information Engineering, Hefei University of Technology, Hefei, China

[2] School of Artificial Intelligence and Big Data, Hefei University, Hefei, China

[3] Guangdong Provincial Key Laboratory of Novel Security Intelligence Technologies, School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China

来源：

ACM Transactions on Evolutionary Learning and Optimization | 2024年 / 4卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Benchmarking - Contrastive Learning - Electric transformers - Transfer learning;

D O I：

10.1145/3689827

中图分类号：

学科分类号：

摘要：

The superior performance of large-scale pre-Trained models, such as Bidirectional Encoder Representations from Transformers (BERT) and Generative Pre-Trained Transformer (GPT), has received increasing attention in both academic and industrial research and has become one of the current research hotspots. A pre-Trained model refers to a model trained on large-scale unlabeled data, whose purpose is to learn general language representation or features for fine-Tuning or transfer learning in subsequent tasks. After pre-Training is complete, a small amount of labeled data can be used to fine-Tune the model for a specific task or domain. This two-stage method of pre-Training+fine-Tuninghas achieved advanced results in natural language processing (NLP) tasks. Despite widespread adoption, existing fixed fine-Tuning schemes that adapt well to one NLP task may perform inconsistently on other NLP tasks given that different tasks have different latent semantic structures. In this article, we explore the effectiveness of automatic fine-Tuning pattern search for layer-wise learning rates from an evolutionary optimization perspective. Our goal is to use evolutionary algorithms to search for better task-dependent fine-Tuning patterns for specific NLP tasks than typical fixed fine-Tuning patterns. Experimental results on two real-world language benchmarks and three advanced pre-Training language models show the effectiveness and generality of the proposed framework. © 2024 held by the owner/author(s).

引用

共 50 条

[1] Span Fine-tuning for Pre-trained Language Models
Bao, Rongzhou
Zhang, Zhuosheng
Zhao, Hai
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1970 - 1979
[2] An Empirical Study on Hyperparameter Optimization for Fine-Tuning Pre-trained Language Models
Liu, Xueqing
Wang, Chi
[J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2286 - 2300
[3] Debiasing Pre-Trained Language Models via Efficient Fine-Tuning
Gira, Michael
Zhang, Ruisu
Lee, Kangwook
[J]. PROCEEDINGS OF THE SECOND WORKSHOP ON LANGUAGE TECHNOLOGY FOR EQUALITY, DIVERSITY AND INCLUSION (LTEDI 2022), 2022, : 59 - 69
[4] Exploiting Syntactic Information to Boost the Fine-tuning of Pre-trained Models
Liu, Chaoming
Zhu, Wenhao
Zhang, Xiaoyu
Zhai, Qiuhong
[J]. 2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 575 - 582
[5] Pathologies of Pre-trained Language Models in Few-shot Fine-tuning
Chen, Hanjie
Zheng, Guoqing
Awadallah, Ahmed Hassan
Ji, Yangfeng
[J]. PROCEEDINGS OF THE THIRD WORKSHOP ON INSIGHTS FROM NEGATIVE RESULTS IN NLP (INSIGHTS 2022), 2022, : 144 - 153
[6] Fine-tuning the hyperparameters of pre-trained models for solving multiclass classification problems
Kaibassova, D.
Nurtay, M.
Tau, A.
Kissina, M.
[J]. COMPUTER OPTICS, 2022, 46 (06) : 971 - 979
[7] Revisiting k-NN for Fine-Tuning Pre-trained Language Models
Li, Lei
Chen, Jing
Tian, Botzhong
Zhang, Ningyu
[J]. CHINESE COMPUTATIONAL LINGUISTICS, CCL 2023, 2023, 14232 : 327 - 338
[8] Fine-Tuning Pre-Trained Language Models Effectively by Optimizing Subnetworks Adaptively
Zhang, Haojie
Li, Ge
Li, Jia
Zhang, Zhongjin
Zhu, Yuqi
Jin, Zhi
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[9] When to Use Multi-Task Learning vs Intermediate Fine-Tuning for Pre-Trained Encoder Transfer Learning
Weller, Orion
Seppi, Kevin
Gardner, Matt
[J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, 2022, : 272 - 282
[10] AutoLR: Layer-wise Pruning and Auto-tuning of Learning Rates in Fine-tuning of Deep Networks
Ro, Younmgin
Choi, Jin Young
[J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 2486 - 2494

← 1 2 3 4 5 →