Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization

被引:0
|
作者
Dong, Kun [1 ,2 ]
Luo, Yongle [1 ,2 ]
Wang, Yuxin [1 ,2 ]
Liu, Yu [1 ,2 ]
Qu, Chengeng [1 ,2 ]
Zhang, Qiang [1 ,2 ]
Cheng, Erkang [1 ,2 ]
Sun, Zhiyong [1 ,2 ]
Song, Bo [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Intelligent Machines, HFIPS, Hefei, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] Jianghuai Frontier Technol Coordinat & Innovat Ctr, Hefei, Peoples R China
关键词
Reinforcement learning; Robotics; Data efficiency; ALGORITHMS;
D O I
10.1016/j.knosys.2024.111428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dyna-style Model-based reinforcement learning (MBRL) methods have demonstrated superior sample efficiency compared to their model-free counterparts, largely attributable to the leverage of learned models. Despite these advancements, the effective application of these learned models remains challenging, largely due to the intricate interdependence between model learning and policy optimization, which presents a significant theoretical gap in this field. This paper bridges this gap by providing a comprehensive theoretical analysis of Dyna-style MBRL for the first time and establishing a return bound in deterministic environments. Building upon this analysis, we propose a novel schema called Model-Based Reinforcement Learning with Model-Free Policy Optimization (MBMFPO). Compared to existing MBRL methods, the proposed schema integrates modelfree policy optimization into the MBRL framework, along with some additional techniques. Experimental results on various continuous control tasks demonstrate that MBMFPO can significantly enhance sample efficiency and final performance compared to baseline methods. Furthermore, extensive ablation studies provide robust evidence for the effectiveness of each individual component within the MBMFPO schema. This work advances both the theoretical analysis and practical application of Dyna-style MBRL, paving the way for more efficient reinforcement learning methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Intelligent Trainer for Dyna-Style Model-Based Deep Reinforcement Learning
    Dong, Linsen
    Li, Yuanlong
    Zhou, Xin
    Wen, Yonggang
    Guan, Kyle
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (06) : 2758 - 2771
  • [2] Physics-informed Dyna-style model-based deep reinforcement learning for dynamic control
    Liu, Xin-Yang
    Wang, Jian-Xun
    [J]. PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2021, 477 (2255):
  • [3] Model-based and Model-free Reinforcement Learning for Visual Servoing
    Farahmand, Amir Massoud
    Shademan, Azad
    Jagersand, Martin
    Szepesvari, Csaba
    [J]. ICRA: 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-7, 2009, : 4135 - 4142
  • [4] Model-Based and Model-Free Replay Mechanisms for Reinforcement Learning in Neurorobotics
    Massi, Elisa
    Barthelemy, Jeanne
    Mailly, Juliane
    Dromnelle, Remi
    Canitrot, Julien
    Poniatowski, Esther
    Girard, Benoit
    Khamassi, Mehdi
    [J]. FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [5] Expert Initialized Hybrid Model-Based and Model-Free Reinforcement Learning
    Langaa, Jeppe
    Sloth, Christoffer
    [J]. 2023 EUROPEAN CONTROL CONFERENCE, ECC, 2023,
  • [6] Hybrid control for combining model-based and model-free reinforcement learning
    Pinosky, Allison
    Abraham, Ian
    Broad, Alexander
    Argall, Brenna
    Murphey, Todd D.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2023, 42 (06): : 337 - 355
  • [7] Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning
    Swazinna, Phillip
    Udluft, Steffen
    Hein, Daniel
    Runkler, Thomas
    [J]. IFAC PAPERSONLINE, 2022, 55 (15): : 19 - 26
  • [8] EEG-based classification of learning strategies : model-based and model-free reinforcement learning
    Kim, Dongjae
    Weston, Charles
    Lee, Sang Wan
    [J]. 2018 6TH INTERNATIONAL CONFERENCE ON BRAIN-COMPUTER INTERFACE (BCI), 2018, : 146 - 148
  • [9] Parallel model-based and model-free reinforcement learning for card sorting performance
    Steinke, Alexander
    Lange, Florian
    Kopp, Bruno
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [10] Successor Features Combine Elements of Model-Free and Model-based Reinforcement Learning
    Lehnert, Lucas
    Littman, Michael L.
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21