Learning to Modulate pre-trained Models in RL

被引：0

作者：

Schmied, Thomas ^{[1
,2
]}

Hofmarcher, Markus ^{[3
]}

Paischer, Fabian ^{[1
,2
]}

Pascanu, Razvan ^{[4
,5
]}

Hochreiter, Sepp ^{[1
,2
]}

机构：

[1] ELLIS Unit Linz, Linz, Austria

[2] Inst Machine Learning, LIT AI Lab, Linz, Austria

[3] Johannes Kepler Univ Linz, JKU LIT SAL eSPML Lab, Inst Machine Learning, Linz, Austria

[4] Google DeepMind, London, England

[5] UCL, London, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

欧盟地平线“2020”;

关键词：

NEURAL-NETWORKS; REINFORCEMENT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement Learning (RL) has been successful in various domains like robotics, game playing, and simulation. While RL agents have shown impressive capabilities in their specific tasks, they insufficiently adapt to new tasks. In supervised learning, this adaptation problem is addressed by large-scale pre-training followed by fine-tuning to new down-stream tasks. Recently, pre-training on multiple tasks has been gaining traction in RL. However, fine-tuning a pre-trained model often suffers from catastrophic forgetting. That is, the performance on the pre-training tasks deteriorates when fine-tuning on new tasks. To investigate the catastrophic forgetting phenomenon, we first jointly pre-train a model on datasets from two benchmark suites, namely Meta-World and DMControl. Then, we evaluate and compare a variety of fine-tuning methods prevalent in natural language processing, both in terms of performance on new tasks, and how well performance on pre-training tasks is retained. Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly. Therefore, we propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model via a learnable modulation pool. Our method achieves state-of-the-art performance on the Continual-World benchmark, while retaining performance on the pre-training tasks. Finally, to aid future research in this area, we release a dataset encompassing 50 Meta-World and 16 DMControl tasks.

引用

页数：35

共 50 条

[1] Towards Inadequately Pre-trained Models in Transfer Learning
Deng, Andong
Li, Xingjian
Hu, Di
Wang, Tianyang
Xiong, Haoyi
Xu, Cheng-Zhong
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 19340 - 19351
[2] Instabilities of Offline RL with Pre-Trained Neural Representation
Wang, Ruosong
Wu, Yifan
Salakhutdinov, Ruslan
Kakade, Sham M.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[3] RanPAC: Random Projections and Pre-trained Models for Continual Learning
McDonnell, Mark D.
Gong, Dong
Parveneh, Amin
Abbasnejad, Ehsan
van den Hengel, Anton
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[4] CODEEDITOR: Learning to Edit Source Code with Pre-trained Models
Li, Jia
Li, Ge
Li, Zhuo
Jin, Zhi
Hu, Xing
Zhang, Kechi
Fu, Zhiyi
[J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2023, 32 (06)
[5] Collaborative Learning across Heterogeneous Systems with Pre-Trained Models
Hoang, Trong Nghia
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 20, 2024, : 22668 - 22668
[6] Meta Distant Transfer Learning for Pre-trained Language Models
Wang, Chengyu
Pan, Haojie
Qiu, Minghui
Yang, Fei
Huang, Jun
Zhang, Yin
[J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9742 - 9752
[7] MODEL SPIDER: Learning to Rank Pre-Trained Models Efficiently
Zhang, Yi-Kai
Huang, Ting-Ji
Ding, Yao-Xiang
Zhan, De-Chuan
Ye, Han-Jia
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[8] Do Pre-trained Models Benefit Equally in Continual Learning?
Lee, Kuan-Ying
Zhong, Yuanyi
Wang, Yu-Xiong
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6474 - 6482
[9] Class-Incremental Learning with Strong Pre-trained Models
Wu, Tz-Ying
Swaminathan, Gurumurthy
Li, Zhizhong
Ravichandran, Avinash
Vasconcelos, Nuno
Bhotika, Rahul
Soatto, Stefano
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 9591 - 9600
[10] LogME: Practical Assessment of Pre-trained Models for Transfer Learning
You, Kaichao
Liu, Yong
Wang, Jianmin
Long, Mingsheng
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

← 1 2 3 4 5 →