Fine-tuning Deep RL with Gradient-Free Optimization

被引:2
|
作者
de Bruin, Tim [1 ]
Kober, Jens [1 ]
Tuyls, Karl [2 ]
Babuska, Robert [1 ]
机构
[1] Delft Univ Technol, Cognit Robot Dept, Delft, Netherlands
[2] Deepmind, Paris, France
来源
IFAC PAPERSONLINE | 2020年 / 53卷 / 02期
关键词
Reinforcement Learning; Deep Learning; Optimization; Neural Networks; Control;
D O I
10.1016/j.ifacol.2020.12.2240
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep reinforcement learning makes it possible to train control policies that map high-dimensional observations to actions. These methods typically use gradient-based optimization techniques to enable relatively efficient learning, but are notoriously sensitive to hyperparameter choices and do not have good convergence properties. Gradient-free optimization methods, such as evolutionary strategies, can offer a more stable alternative but tend to be much less sample efficient. In this work we propose a combination, using the relative strengths of both. We start with a gradient-based initial training phase, which is used to quickly learn both a state representation and an initial policy. This phase is followed by a gradient-free optimization of only the final action selection parameters. This enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization. We demonstrate the effectiveness of the method on two Atari games, a continuous control benchmark and the CarRacing-v0 benchmark. On the latter we surpass the best previously reported score while using significantly fewer episodes. Copyright (C) 2020 The Authors.
引用
收藏
页码:8049 / 8056
页数:8
相关论文
共 50 条
  • [41] Incremental accelerated gradient descent and adaptive fine-tuning heuristic performance optimization for robotic motion planning
    Li, Shengjie
    Wang, Jin
    Zhang, Haiyun
    Feng, Yichang
    Lu, Guodong
    Zhai, Anbang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 243
  • [42] FINE-TUNING THE FOIA
    KENNEDY, P
    COLUMBIA JOURNALISM REVIEW, 1984, 23 (03) : 8 - 9
  • [43] FINE-TUNING THE MULTIVERSE
    Metcalf, Thomas
    FAITH AND PHILOSOPHY, 2018, 35 (01) : 3 - 32
  • [44] Fine-tuning tools
    Arianne Heinrichs
    Nature Reviews Molecular Cell Biology, 2006, 7 : 466 - 466
  • [45] Natural fine-tuning
    Saleem, Anjum
    Medina, Luisa
    Kunststoffe International, 2019, 109 (04): : 45 - 48
  • [46] Fine-tuning subtitle
    Waste Age, 2002, 33 (11): : 32 - 46
  • [47] Fine-tuning metabolism
    Sarah Seton-Rogers
    Nature Reviews Cancer, 2014, 14 : 705 - 705
  • [48] Fine-tuning feedback
    Alicia Newton
    Nature Climate Change, 2008, 1 (802) : 17 - 17
  • [49] What fine-tuning?
    Weart, Spencer
    NEW SCIENTIST, 2009, 201 (2693) : 24 - 24
  • [50] Fine-tuning adhesives
    Grant Miura
    Nature Chemical Biology, 2020, 16 : 1153 - 1153