Fine-tuning Deep RL with Gradient-Free Optimization

被引：2

作者：

de Bruin, Tim ^{[1
]}

Kober, Jens ^{[1
]}

Tuyls, Karl ^{[2
]}

Babuska, Robert ^{[1
]}

机构：

[1] Delft Univ Technol, Cognit Robot Dept, Delft, Netherlands

[2] Deepmind, Paris, France

来源：

IFAC PAPERSONLINE | 2020年 / 53卷 / 02期

关键词：

Reinforcement Learning; Deep Learning; Optimization; Neural Networks; Control;

D O I：

10.1016/j.ifacol.2020.12.2240

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Deep reinforcement learning makes it possible to train control policies that map high-dimensional observations to actions. These methods typically use gradient-based optimization techniques to enable relatively efficient learning, but are notoriously sensitive to hyperparameter choices and do not have good convergence properties. Gradient-free optimization methods, such as evolutionary strategies, can offer a more stable alternative but tend to be much less sample efficient. In this work we propose a combination, using the relative strengths of both. We start with a gradient-based initial training phase, which is used to quickly learn both a state representation and an initial policy. This phase is followed by a gradient-free optimization of only the final action selection parameters. This enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization. We demonstrate the effectiveness of the method on two Atari games, a continuous control benchmark and the CarRacing-v0 benchmark. On the latter we surpass the best previously reported score while using significantly fewer episodes. Copyright (C) 2020 The Authors.

引用

页码：8049 / 8056

页数：8

共 50 条

[21] An adaptive Bayesian approach to gradient-free global optimization
Yu, Jianneng
Morozov, Alexandre, V
NEW JOURNAL OF PHYSICS, 2024, 26 (02):
[22] Metamaterials design using gradient-free numerical optimization
Diest, Kenneth
Sweatlock, Luke A.
Marthaler, Daniel E.
JOURNAL OF APPLIED PHYSICS, 2010, 108 (08)
[23] Gradient-free optimization of chaotic acoustics with reservoir computing
Huhn, Francisco
Magri, Luca
PHYSICAL REVIEW FLUIDS, 2022, 7 (01)
[24] Gradient-free strategies to robust well control optimization
Jefferson Wellano Oliveira Pinto
Juan Alberto Rojas Tueros
Bernardo Horowitz
Silvana Maria Bastos Afonso da Silva
Ramiro Brito Willmersdorf
Diego Felipe Barbosa de Oliveira
Computational Geosciences, 2020, 24 : 1959 - 1978
[25] Gradient-free algorithms for distributed online convex optimization
Liu, Yuhang
Zhao, Wenxiao
Dong, Daoyi
ASIAN JOURNAL OF CONTROL, 2023, 25 (04) : 2451 - 2468
[26] Free Fine-tuning: A Plug-and-Play Watermarking Scheme for Deep Neural Networks
Wang, Run
Ren, Jixing
Li, Boheng
She, Tianyi
Zhang, Wenhui
Fang, Liming
Chen, Jing
Wang, Lina
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8463 - 8474
[27] INCREMENTAL GRADIENT-FREE METHOD FOR NONSMOOTH DISTRIBUTED OPTIMIZATION
Li, Jueyou
Li, Guoquan
Wu, Zhiyou
Wu, Changzhi
Wang, Xiangyu
Lee, Jae-Myung
Jung, Kwang-Hyo
JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2017, 13 (04) : 1841 - 1857
[28] Gradient-Free Method for Heavily Constrained Nonconvex Optimization
Shi, Wanli
Gao, Hongchang
Gu, Bin
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[29] FINE-TUNING FINE CHEMICALS
ROYSE, S
EUROPEAN CHEMICAL NEWS, 1995, 64 (1693): : 28 - &
[30] ABCAttack: A Gradient-Free Optimization Black-Box Attack for Fooling Deep Image Classifiers
Cao, Han
Si, Chengxiang
Sun, Qindong
Liu, Yanxiao
Li, Shancang
Gope, Prosanta
ENTROPY, 2022, 24 (03)

← 1 2 3 4 5 →