Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

被引:0
|
作者
Nagabandi, Anusha [1 ]
Kahn, Gregory [1 ]
Fearing, Ronald S. [1 ]
Levine, Sergey [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Model-free deep reinforcement learning algorithms have been shown to be capable of learning a wide range of robotic skills, but typically require a very large number of samples to achieve good performance. Model-based algorithms, in principle, can provide for much more efficient learning, but have proven difficult to extend to expressive, high-capacity models such as deep neural networks. In this work, we demonstrate that neural network dynamics models can in fact be combined with model predictive control (MPC) to achieve excellent sample complexity in a model-based reinforcement learning algorithm, producing stable and plausible gaits that accomplish various complex locomotion tasks. We further propose using deep neural network dynamics models to initialize a model-free learner, in order to combine the sample efficiency of model-based approaches with the high task-specific performance of model-free methods. We empirically demonstrate on MuJoCo locomotion tasks that our pure model-based approach trained on just random action data can follow arbitrary trajectories with excellent sample efficiency, and that our hybrid algorithm can accelerate model-free learning on high-speed benchmark tasks, achieving sample efficiency gains of 3 - 5x on swimmer, cheetah, hopper, and ant agents. Videos can be found at https://sites.google.com/view/mbmf
引用
收藏
页码:7579 / 7586
页数:8
相关论文
共 50 条
  • [1] Model-based and Model-free Reinforcement Learning for Visual Servoing
    Farahmand, Amir Massoud
    Shademan, Azad
    Jagersand, Martin
    Szepesvari, Csaba
    [J]. ICRA: 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-7, 2009, : 4135 - 4142
  • [2] Sim-to-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing
    Yang, Max
    Lin, Yijiong
    Church, Alex
    Lloyd, John
    Zhang, Dandan
    Barton, David A. W.
    Lepora, Nathan F.
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (09) : 5480 - 5487
  • [3] Model-Based and Model-Free Replay Mechanisms for Reinforcement Learning in Neurorobotics
    Massi, Elisa
    Barthelemy, Jeanne
    Mailly, Juliane
    Dromnelle, Remi
    Canitrot, Julien
    Poniatowski, Esther
    Girard, Benoit
    Khamassi, Mehdi
    [J]. FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [4] Expert Initialized Hybrid Model-Based and Model-Free Reinforcement Learning
    Langaa, Jeppe
    Sloth, Christoffer
    [J]. 2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [5] Hybrid control for combining model-based and model-free reinforcement learning
    Pinosky, Allison
    Abraham, Ian
    Broad, Alexander
    Argall, Brenna
    Murphey, Todd D.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2023, 42 (06): : 337 - 355
  • [6] Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
    Swazinna, Phillip
    Udluft, Steffen
    Hein, Daniel
    Runkler, Thomas
    [J]. IFAC PAPERSONLINE, 2022, 55 (15): : 19 - 26
  • [7] EEG-based classification of learning strategies : model-based and model-free reinforcement learning
    Kim, Dongjae
    Weston, Charles
    Lee, Sang Wan
    [J]. 2018 6TH INTERNATIONAL CONFERENCE ON BRAIN-COMPUTER INTERFACE (BCI), 2018, : 146 - 148
  • [8] Parallel model-based and model-free reinforcement learning for card sorting performance
    Steinke, Alexander
    Lange, Florian
    Kopp, Bruno
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [9] Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning
    Lehnert, Lucas
    Littman, Michael L.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [10] Variability in Dopamine Genes Dissociates Model-Based and Model-Free Reinforcement Learning
    Doll, Bradley B.
    Bath, Kevin G.
    Daw, Nathaniel D.
    Frank, Michael J.
    [J]. JOURNAL OF NEUROSCIENCE, 2016, 36 (04): : 1211 - 1222