Fine-tuning Deep RL with Gradient-Free Optimization

被引:2
|
作者
de Bruin, Tim [1 ]
Kober, Jens [1 ]
Tuyls, Karl [2 ]
Babuska, Robert [1 ]
机构
[1] Delft Univ Technol, Cognit Robot Dept, Delft, Netherlands
[2] Deepmind, Paris, France
来源
IFAC PAPERSONLINE | 2020年 / 53卷 / 02期
关键词
Reinforcement Learning; Deep Learning; Optimization; Neural Networks; Control;
D O I
10.1016/j.ifacol.2020.12.2240
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep reinforcement learning makes it possible to train control policies that map high-dimensional observations to actions. These methods typically use gradient-based optimization techniques to enable relatively efficient learning, but are notoriously sensitive to hyperparameter choices and do not have good convergence properties. Gradient-free optimization methods, such as evolutionary strategies, can offer a more stable alternative but tend to be much less sample efficient. In this work we propose a combination, using the relative strengths of both. We start with a gradient-based initial training phase, which is used to quickly learn both a state representation and an initial policy. This phase is followed by a gradient-free optimization of only the final action selection parameters. This enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization. We demonstrate the effectiveness of the method on two Atari games, a continuous control benchmark and the CarRacing-v0 benchmark. On the latter we surpass the best previously reported score while using significantly fewer episodes. Copyright (C) 2020 The Authors.
引用
收藏
页码:8049 / 8056
页数:8
相关论文
共 50 条
  • [21] An adaptive Bayesian approach to gradient-free global optimization
    Yu, Jianneng
    Morozov, Alexandre, V
    NEW JOURNAL OF PHYSICS, 2024, 26 (02):
  • [22] Metamaterials design using gradient-free numerical optimization
    Diest, Kenneth
    Sweatlock, Luke A.
    Marthaler, Daniel E.
    JOURNAL OF APPLIED PHYSICS, 2010, 108 (08)
  • [23] Gradient-free optimization of chaotic acoustics with reservoir computing
    Huhn, Francisco
    Magri, Luca
    PHYSICAL REVIEW FLUIDS, 2022, 7 (01)
  • [24] Gradient-free strategies to robust well control optimization
    Jefferson Wellano Oliveira Pinto
    Juan Alberto Rojas Tueros
    Bernardo Horowitz
    Silvana Maria Bastos Afonso da Silva
    Ramiro Brito Willmersdorf
    Diego Felipe Barbosa de Oliveira
    Computational Geosciences, 2020, 24 : 1959 - 1978
  • [25] Gradient-free algorithms for distributed online convex optimization
    Liu, Yuhang
    Zhao, Wenxiao
    Dong, Daoyi
    ASIAN JOURNAL OF CONTROL, 2023, 25 (04) : 2451 - 2468
  • [26] Free Fine-tuning: A Plug-and-Play Watermarking Scheme for Deep Neural Networks
    Wang, Run
    Ren, Jixing
    Li, Boheng
    She, Tianyi
    Zhang, Wenhui
    Fang, Liming
    Chen, Jing
    Wang, Lina
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8463 - 8474
  • [27] INCREMENTAL GRADIENT-FREE METHOD FOR NONSMOOTH DISTRIBUTED OPTIMIZATION
    Li, Jueyou
    Li, Guoquan
    Wu, Zhiyou
    Wu, Changzhi
    Wang, Xiangyu
    Lee, Jae-Myung
    Jung, Kwang-Hyo
    JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2017, 13 (04) : 1841 - 1857
  • [28] Gradient-Free Method for Heavily Constrained Nonconvex Optimization
    Shi, Wanli
    Gao, Hongchang
    Gu, Bin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [29] FINE-TUNING FINE CHEMICALS
    ROYSE, S
    EUROPEAN CHEMICAL NEWS, 1995, 64 (1693): : 28 - &
  • [30] ABCAttack: A Gradient-Free Optimization Black-Box Attack for Fooling Deep Image Classifiers
    Cao, Han
    Si, Chengxiang
    Sun, Qindong
    Liu, Yanxiao
    Li, Shancang
    Gope, Prosanta
    ENTROPY, 2022, 24 (03)