Fine-tuning Deep RL with Gradient-Free Optimization

被引：2

作者：

de Bruin, Tim ^{[1
]}

Kober, Jens ^{[1
]}

Tuyls, Karl ^{[2
]}

Babuska, Robert ^{[1
]}

机构：

[1] Delft Univ Technol, Cognit Robot Dept, Delft, Netherlands

[2] Deepmind, Paris, France

来源：

IFAC PAPERSONLINE | 2020年 / 53卷 / 02期

关键词：

Reinforcement Learning; Deep Learning; Optimization; Neural Networks; Control;

D O I：

10.1016/j.ifacol.2020.12.2240

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep reinforcement learning makes it possible to train control policies that map high-dimensional observations to actions. These methods typically use gradient-based optimization techniques to enable relatively efficient learning, but are notoriously sensitive to hyperparameter choices and do not have good convergence properties. Gradient-free optimization methods, such as evolutionary strategies, can offer a more stable alternative but tend to be much less sample efficient. In this work we propose a combination, using the relative strengths of both. We start with a gradient-based initial training phase, which is used to quickly learn both a state representation and an initial policy. This phase is followed by a gradient-free optimization of only the final action selection parameters. This enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization. We demonstrate the effectiveness of the method on two Atari games, a continuous control benchmark and the CarRacing-v0 benchmark. On the latter we surpass the best previously reported score while using significantly fewer episodes. Copyright (C) 2020 The Authors.

引用

页码：8049 / 8056

页数：8

共 50 条

[1] Amazon SageMaker Automatic Model Tuning: Scalable Gradient-Free Optimization
Perrone, Valerio
Shen, Huibin
Zolic, Aida
Shcherbatyi, Iaroslav
Ahmed, Amr
Bansal, Tanya
Donini, Michele
Winkelmolen, Fela
Jenatton, Rodolphe
Faddoul, Jean Baptiste
Pogorzelska, Barbara
Miladinovic, Miroslav
Kenthapadi, Krishnaram
Seeger, Matthias
Archambeau, Cedric
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3463 - 3471
[2] Reliable Gradient-free and Likelihood-free Prompt Tuning
Shen, Maohao
Ghosh, Soumya
Sattigeri, Prasanna
Das, Subhro
Bu, Yuheng
Wornell, Gregory
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 2416 - 2429
[3] Gradient Sparsification For Masked Fine-Tuning of Transformers
O'Neill, James
Dutta, Sourav
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[4] Automatic Tuning of Tensorflow's CPU Backend Using Gradient-Free Optimization Algorithms
Mebratu, Derssie
Hasabnis, Niranjan
Mercati, Pietro
Sharma, Gaurit
Najnin, Shamima
HIGH PERFORMANCE COMPUTING - ISC HIGH PERFORMANCE DIGITAL 2021 INTERNATIONAL WORKSHOPS, 2021, 12761 : 249 - 266
[5] Distributed Online Optimization With Gradient-free Design
Wang, Lingfei
Wang, Yinghui
Hong, Yiguang
PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 5677 - 5682
[6] Gradient-free method for nonsmooth distributed optimization
Li, Jueyou
Wu, Changzhi
Wu, Zhiyou
Long, Qiang
JOURNAL OF GLOBAL OPTIMIZATION, 2015, 61 (02) : 325 - 340
[7] Gradient-free distributed optimization with exact convergence
Pang, Yipeng
Hu, Guoqiang
AUTOMATICA, 2022, 144
[8] Gradient-free method for nonsmooth distributed optimization
Jueyou Li
Changzhi Wu
Zhiyou Wu
Qiang Long
Journal of Global Optimization, 2015, 61 : 325 - 340
[9] Effect of barren plateaus on gradient-free optimization
Arrasmith, Andrew
Cerezo, M.
Czarnik, Piotr
Cincio, Lukasz
Coles, Patrick J.
QUANTUM, 2021, 5
[10] Gradient-Free and Gradient-Based Optimization of a Radial Turbine
Lachenmaier, Nicolas
Baumgaertner, Daniel
Schiffer, Heinz-Peter
Kech, Johannes
INTERNATIONAL JOURNAL OF TURBOMACHINERY PROPULSION AND POWER, 2020, 5 (03)

← 1 2 3 4 5 →