Fine-tuning Deep RL with Gradient-Free Optimization

被引:2
|
作者
de Bruin, Tim [1 ]
Kober, Jens [1 ]
Tuyls, Karl [2 ]
Babuska, Robert [1 ]
机构
[1] Delft Univ Technol, Cognit Robot Dept, Delft, Netherlands
[2] Deepmind, Paris, France
来源
IFAC PAPERSONLINE | 2020年 / 53卷 / 02期
关键词
Reinforcement Learning; Deep Learning; Optimization; Neural Networks; Control;
D O I
10.1016/j.ifacol.2020.12.2240
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep reinforcement learning makes it possible to train control policies that map high-dimensional observations to actions. These methods typically use gradient-based optimization techniques to enable relatively efficient learning, but are notoriously sensitive to hyperparameter choices and do not have good convergence properties. Gradient-free optimization methods, such as evolutionary strategies, can offer a more stable alternative but tend to be much less sample efficient. In this work we propose a combination, using the relative strengths of both. We start with a gradient-based initial training phase, which is used to quickly learn both a state representation and an initial policy. This phase is followed by a gradient-free optimization of only the final action selection parameters. This enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization. We demonstrate the effectiveness of the method on two Atari games, a continuous control benchmark and the CarRacing-v0 benchmark. On the latter we surpass the best previously reported score while using significantly fewer episodes. Copyright (C) 2020 The Authors.
引用
收藏
页码:8049 / 8056
页数:8
相关论文
共 50 条
  • [1] Amazon SageMaker Automatic Model Tuning: Scalable Gradient-Free Optimization
    Perrone, Valerio
    Shen, Huibin
    Zolic, Aida
    Shcherbatyi, Iaroslav
    Ahmed, Amr
    Bansal, Tanya
    Donini, Michele
    Winkelmolen, Fela
    Jenatton, Rodolphe
    Faddoul, Jean Baptiste
    Pogorzelska, Barbara
    Miladinovic, Miroslav
    Kenthapadi, Krishnaram
    Seeger, Matthias
    Archambeau, Cedric
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3463 - 3471
  • [2] Reliable Gradient-free and Likelihood-free Prompt Tuning
    Shen, Maohao
    Ghosh, Soumya
    Sattigeri, Prasanna
    Das, Subhro
    Bu, Yuheng
    Wornell, Gregory
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2416 - 2429
  • [3] Gradient Sparsification For Masked Fine-Tuning of Transformers
    O'Neill, James
    Dutta, Sourav
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [4] Automatic Tuning of Tensorflow's CPU Backend Using Gradient-Free Optimization Algorithms
    Mebratu, Derssie
    Hasabnis, Niranjan
    Mercati, Pietro
    Sharma, Gaurit
    Najnin, Shamima
    HIGH PERFORMANCE COMPUTING - ISC HIGH PERFORMANCE DIGITAL 2021 INTERNATIONAL WORKSHOPS, 2021, 12761 : 249 - 266
  • [5] Distributed Online Optimization With Gradient-free Design
    Wang, Lingfei
    Wang, Yinghui
    Hong, Yiguang
    PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 5677 - 5682
  • [6] Gradient-free method for nonsmooth distributed optimization
    Li, Jueyou
    Wu, Changzhi
    Wu, Zhiyou
    Long, Qiang
    JOURNAL OF GLOBAL OPTIMIZATION, 2015, 61 (02) : 325 - 340
  • [7] Gradient-free distributed optimization with exact convergence
    Pang, Yipeng
    Hu, Guoqiang
    AUTOMATICA, 2022, 144
  • [8] Gradient-free method for nonsmooth distributed optimization
    Jueyou Li
    Changzhi Wu
    Zhiyou Wu
    Qiang Long
    Journal of Global Optimization, 2015, 61 : 325 - 340
  • [9] Effect of barren plateaus on gradient-free optimization
    Arrasmith, Andrew
    Cerezo, M.
    Czarnik, Piotr
    Cincio, Lukasz
    Coles, Patrick J.
    QUANTUM, 2021, 5
  • [10] Gradient-Free and Gradient-Based Optimization of a Radial Turbine
    Lachenmaier, Nicolas
    Baumgaertner, Daniel
    Schiffer, Heinz-Peter
    Kech, Johannes
    INTERNATIONAL JOURNAL OF TURBOMACHINERY PROPULSION AND POWER, 2020, 5 (03)