共 50 条
Fine-tuning Deep RL with Gradient-Free Optimization
被引:2
|作者:
de Bruin, Tim
[1
]
Kober, Jens
[1
]
Tuyls, Karl
[2
]
Babuska, Robert
[1
]
机构:
[1] Delft Univ Technol, Cognit Robot Dept, Delft, Netherlands
[2] Deepmind, Paris, France
来源:
关键词:
Reinforcement Learning;
Deep Learning;
Optimization;
Neural Networks;
Control;
D O I:
10.1016/j.ifacol.2020.12.2240
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
Deep reinforcement learning makes it possible to train control policies that map high-dimensional observations to actions. These methods typically use gradient-based optimization techniques to enable relatively efficient learning, but are notoriously sensitive to hyperparameter choices and do not have good convergence properties. Gradient-free optimization methods, such as evolutionary strategies, can offer a more stable alternative but tend to be much less sample efficient. In this work we propose a combination, using the relative strengths of both. We start with a gradient-based initial training phase, which is used to quickly learn both a state representation and an initial policy. This phase is followed by a gradient-free optimization of only the final action selection parameters. This enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization. We demonstrate the effectiveness of the method on two Atari games, a continuous control benchmark and the CarRacing-v0 benchmark. On the latter we surpass the best previously reported score while using significantly fewer episodes. Copyright (C) 2020 The Authors.
引用
收藏
页码:8049 / 8056
页数:8
相关论文