Fine-tuning Deep RL with Gradient-Free Optimization

被引：2

作者：

de Bruin, Tim ^{[1
]}

Kober, Jens ^{[1
]}

Tuyls, Karl ^{[2
]}

Babuska, Robert ^{[1
]}

机构：

[1] Delft Univ Technol, Cognit Robot Dept, Delft, Netherlands

[2] Deepmind, Paris, France

来源：

IFAC PAPERSONLINE | 2020年 / 53卷 / 02期

关键词：

Reinforcement Learning; Deep Learning; Optimization; Neural Networks; Control;

D O I：

10.1016/j.ifacol.2020.12.2240

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep reinforcement learning makes it possible to train control policies that map high-dimensional observations to actions. These methods typically use gradient-based optimization techniques to enable relatively efficient learning, but are notoriously sensitive to hyperparameter choices and do not have good convergence properties. Gradient-free optimization methods, such as evolutionary strategies, can offer a more stable alternative but tend to be much less sample efficient. In this work we propose a combination, using the relative strengths of both. We start with a gradient-based initial training phase, which is used to quickly learn both a state representation and an initial policy. This phase is followed by a gradient-free optimization of only the final action selection parameters. This enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization. We demonstrate the effectiveness of the method on two Atari games, a continuous control benchmark and the CarRacing-v0 benchmark. On the latter we surpass the best previously reported score while using significantly fewer episodes. Copyright (C) 2020 The Authors.

引用

页码：8049 / 8056

页数：8

共 50 条

[31] Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization
Chen, Lesi
Xu, Jing
Luo, Luo
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
[32] Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods
Bogolubsky, Lev
Gusev, Gleb
Raigorodskii, Andrei
Tikhonov, Aleksey
Zhukovskii, Maksim
Dvurechensky, Pavel
Gasnikov, Alexander
Nesterov, Yurii
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[33] A stochastic subspace approach to gradient-free optimization in high dimensions
Kozak, David
Becker, Stephen
Doostan, Alireza
Tenorio, Luis
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2021, 79 (02) : 339 - 368
[34] Gradient-Free Multi-Agent Nonconvex Nonsmooth Optimization
Hajinezhad, Davood
Zavlanos, Michael M.
2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4939 - 4944
[35] A conjecture on global optimization using gradient-free stochastic approximation
Maryak, JL
Chin, DC
JOINT CONFERENCE ON THE SCIENCE AND TECHNOLOGY OF INTELLIGENT SYSTEMS, 1998, : 441 - 445
[36] GRADIENT-FREE DECODING PARAMETER OPTIMIZATION ON AUTOMATIC SPEECH RECOGNITION
Le Nguyen, Thach
Stein, Daniel
Stadtschnitzer, Michael
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[37] Multiobjective Optimization for Turbofan Engine Using Gradient-Free Method
Chen, Ran
Li, Yuzhe
Sun, Xi-Ming
Chai, Tianyou
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (07): : 4345 - 4357
[38] Level set methods for gradient-free optimization of metasurface arrays
Alex Saad-Falcon
Christopher Howard
Justin Romberg
Kenneth Allen
Scientific Reports, 14 (1)
[39] Strong consistency of random gradient-free algorithms for distributed optimization
Chen, Xing-Min
Gao, Chao
OPTIMAL CONTROL APPLICATIONS & METHODS, 2017, 38 (02): : 247 - 265
[40] Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization
Lin, Tianyi
Zheng, Zeyu
Jordan, Michael I.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,

← 1 2 3 4 5 →