Learning to Modulate pre-trained Models in RL

被引：0

作者：

Schmied, Thomas ^{[1
,2
]}

Hofmarcher, Markus ^{[3
]}

Paischer, Fabian ^{[1
,2
]}

Pascanu, Razvan ^{[4
,5
]}

Hochreiter, Sepp ^{[1
,2
]}

机构：

[1] ELLIS Unit Linz, Linz, Austria

[2] Inst Machine Learning, LIT AI Lab, Linz, Austria

[3] Johannes Kepler Univ Linz, JKU LIT SAL eSPML Lab, Inst Machine Learning, Linz, Austria

[4] Google DeepMind, London, England

[5] UCL, London, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

欧盟地平线“2020”;

关键词：

NEURAL-NETWORKS; REINFORCEMENT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement Learning (RL) has been successful in various domains like robotics, game playing, and simulation. While RL agents have shown impressive capabilities in their specific tasks, they insufficiently adapt to new tasks. In supervised learning, this adaptation problem is addressed by large-scale pre-training followed by fine-tuning to new down-stream tasks. Recently, pre-training on multiple tasks has been gaining traction in RL. However, fine-tuning a pre-trained model often suffers from catastrophic forgetting. That is, the performance on the pre-training tasks deteriorates when fine-tuning on new tasks. To investigate the catastrophic forgetting phenomenon, we first jointly pre-train a model on datasets from two benchmark suites, namely Meta-World and DMControl. Then, we evaluate and compare a variety of fine-tuning methods prevalent in natural language processing, both in terms of performance on new tasks, and how well performance on pre-training tasks is retained. Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly. Therefore, we propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model via a learnable modulation pool. Our method achieves state-of-the-art performance on the Continual-World benchmark, while retaining performance on the pre-training tasks. Finally, to aid future research in this area, we release a dataset encompassing 50 Meta-World and 16 DMControl tasks.

引用

页数：35

共 50 条

[31] Generalization of vision pre-trained models for histopathology
Milad Sikaroudi
Maryam Hosseini
Ricardo Gonzalez
Shahryar Rahnamayan
H. R. Tizhoosh
[J]. Scientific Reports, 13
[32] Pre-trained models: Past, present and future
Han, Xu
Zhang, Zhengyan
Ding, Ning
Gu, Yuxian
Liu, Xiao
Huo, Yuqi
Qiu, Jiezhong
Yao, Yuan
Zhang, Ao
Zhang, Liang
Han, Wentao
Huang, Minlie
Jin, Qin
Lan, Yanyan
Liu, Yang
Liu, Zhiyuan
Lu, Zhiwu
Qiu, Xipeng
Song, Ruihua
Tang, Jie
Wen, Ji-Rong
Yuan, Jinhui
Zhao, Wayne Xin
Zhu, Jun
[J]. AI OPEN, 2021, 2 : 225 - 250
[33] Interpreting Art by Leveraging Pre-Trained Models
Penzel, Niklas
Denzler, Joachim
[J]. 2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
[34] Lottery Jackpots Exist in Pre-Trained Models
Zhang, Yuxin
Lin, Mingbao
Zhong, Yunshan
Chao, Fei
Ji, Rongrong
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (12) : 14990 - 15004
[35] LaoPLM: Pre-trained Language Models for Lao
Lin, Nankai
Fu, Yingwen
Yang, Ziyu
Chen, Chuwei
Jiang, Shengyi
[J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512
[36] Generalization of vision pre-trained models for histopathology
Sikaroudi, Milad
Hosseini, Maryam
Gonzalez, Ricardo
Rahnamayan, Shahryar
Tizhoosh, H. R.
[J]. SCIENTIFIC REPORTS, 2023, 13 (01)
[37] PhoBERT: Pre-trained language models for Vietnamese
Dat Quoc Nguyen
Anh Tuan Nguyen
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1037 - 1042
[38] HinPLMs: Pre-trained Language Models for Hindi
Huang, Xixuan
Lin, Nankai
Li, Kexin
Wang, Lianxi
Gan, Suifu
[J]. 2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 241 - 246
[39] Semantic Programming by Example with Pre-trained Models
Verbruggen, Gust
Le, Vu
Gulwani, Sumit
[J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2021, 5 (OOPSLA):
[40] Evaluating Commonsense in Pre-Trained Language Models
Zhou, Xuhui
Zhang, Yue
Cui, Leyang
Huang, Dandan
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9733 - 9740

← 1 2 3 4 5 →