Fine-tuning Deep RL with Gradient-Free Optimization

被引:2
|
作者
de Bruin, Tim [1 ]
Kober, Jens [1 ]
Tuyls, Karl [2 ]
Babuska, Robert [1 ]
机构
[1] Delft Univ Technol, Cognit Robot Dept, Delft, Netherlands
[2] Deepmind, Paris, France
来源
IFAC PAPERSONLINE | 2020年 / 53卷 / 02期
关键词
Reinforcement Learning; Deep Learning; Optimization; Neural Networks; Control;
D O I
10.1016/j.ifacol.2020.12.2240
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep reinforcement learning makes it possible to train control policies that map high-dimensional observations to actions. These methods typically use gradient-based optimization techniques to enable relatively efficient learning, but are notoriously sensitive to hyperparameter choices and do not have good convergence properties. Gradient-free optimization methods, such as evolutionary strategies, can offer a more stable alternative but tend to be much less sample efficient. In this work we propose a combination, using the relative strengths of both. We start with a gradient-based initial training phase, which is used to quickly learn both a state representation and an initial policy. This phase is followed by a gradient-free optimization of only the final action selection parameters. This enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization. We demonstrate the effectiveness of the method on two Atari games, a continuous control benchmark and the CarRacing-v0 benchmark. On the latter we surpass the best previously reported score while using significantly fewer episodes. Copyright (C) 2020 The Authors.
引用
收藏
页码:8049 / 8056
页数:8
相关论文
共 50 条
  • [31] Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization
    Chen, Lesi
    Xu, Jing
    Luo, Luo
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [32] Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods
    Bogolubsky, Lev
    Gusev, Gleb
    Raigorodskii, Andrei
    Tikhonov, Aleksey
    Zhukovskii, Maksim
    Dvurechensky, Pavel
    Gasnikov, Alexander
    Nesterov, Yurii
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [33] A stochastic subspace approach to gradient-free optimization in high dimensions
    Kozak, David
    Becker, Stephen
    Doostan, Alireza
    Tenorio, Luis
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2021, 79 (02) : 339 - 368
  • [34] Gradient-Free Multi-Agent Nonconvex Nonsmooth Optimization
    Hajinezhad, Davood
    Zavlanos, Michael M.
    2018 IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2018, : 4939 - 4944
  • [35] A conjecture on global optimization using gradient-free stochastic approximation
    Maryak, JL
    Chin, DC
    JOINT CONFERENCE ON THE SCIENCE AND TECHNOLOGY OF INTELLIGENT SYSTEMS, 1998, : 441 - 445
  • [36] GRADIENT-FREE DECODING PARAMETER OPTIMIZATION ON AUTOMATIC SPEECH RECOGNITION
    Le Nguyen, Thach
    Stein, Daniel
    Stadtschnitzer, Michael
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [37] Multiobjective Optimization for Turbofan Engine Using Gradient-Free Method
    Chen, Ran
    Li, Yuzhe
    Sun, Xi-Ming
    Chai, Tianyou
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2024, 54 (07): : 4345 - 4357
  • [38] Level set methods for gradient-free optimization of metasurface arrays
    Alex Saad-Falcon
    Christopher Howard
    Justin Romberg
    Kenneth Allen
    Scientific Reports, 14 (1)
  • [39] Strong consistency of random gradient-free algorithms for distributed optimization
    Chen, Xing-Min
    Gao, Chao
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2017, 38 (02): : 247 - 265
  • [40] Gradient-Free Methods for Deterministic and Stochastic Nonsmooth Nonconvex Optimization
    Lin, Tianyi
    Zheng, Zeyu
    Jordan, Michael I.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,