Dyna-style Model-based reinforcement learning with Model-Free Policy Optimization

被引:0
|
作者
Dong, Kun [1 ,2 ]
Luo, Yongle [1 ,2 ]
Wang, Yuxin [1 ,2 ]
Liu, Yu [1 ,2 ]
Qu, Chengeng [1 ,2 ]
Zhang, Qiang [1 ,2 ]
Cheng, Erkang [1 ,2 ]
Sun, Zhiyong [1 ,2 ]
Song, Bo [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Inst Intelligent Machines, HFIPS, Hefei, Peoples R China
[2] Univ Sci & Technol China, Hefei, Peoples R China
[3] Jianghuai Frontier Technol Coordinat & Innovat Ctr, Hefei, Peoples R China
关键词
Reinforcement learning; Robotics; Data efficiency; ALGORITHMS;
D O I
10.1016/j.knosys.2024.111428
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dyna-style Model-based reinforcement learning (MBRL) methods have demonstrated superior sample efficiency compared to their model-free counterparts, largely attributable to the leverage of learned models. Despite these advancements, the effective application of these learned models remains challenging, largely due to the intricate interdependence between model learning and policy optimization, which presents a significant theoretical gap in this field. This paper bridges this gap by providing a comprehensive theoretical analysis of Dyna-style MBRL for the first time and establishing a return bound in deterministic environments. Building upon this analysis, we propose a novel schema called Model-Based Reinforcement Learning with Model-Free Policy Optimization (MBMFPO). Compared to existing MBRL methods, the proposed schema integrates modelfree policy optimization into the MBRL framework, along with some additional techniques. Experimental results on various continuous control tasks demonstrate that MBMFPO can significantly enhance sample efficiency and final performance compared to baseline methods. Furthermore, extensive ablation studies provide robust evidence for the effectiveness of each individual component within the MBMFPO schema. This work advances both the theoretical analysis and practical application of Dyna-style MBRL, paving the way for more efficient reinforcement learning methods.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Extraversion differentiates between model-based and model-free strategies in a reinforcement learning task
    Skatova, Anya
    Chan, Patricia A.
    Daw, Nathaniel D.
    [J]. FRONTIERS IN HUMAN NEUROSCIENCE, 2013, 7
  • [22] The modulation of acute stress on model-free and model-based reinforcement learning in gambling disorder
    Wyckmans, Florent
    Banerjee, Nilosmita
    Saeremans, Melanie
    Otto, Ross
    Kornreich, Charles
    Vanderijst, Laetitia
    Gruson, Damien
    Carbone, Vincenzo
    Bechara, Antoine
    Buchanan, Tony
    Noel, Xavier
    [J]. JOURNAL OF BEHAVIORAL ADDICTIONS, 2022, 11 (03) : 831 - 844
  • [23] Accelerating Model-Free Policy Optimization Using Model-Based Gradient: A Composite Optimization Perspective
    Li, Yansong
    Han, Shuo
    [J]. LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 168, 2022, 168
  • [24] Model-Based Reinforcement Learning via Proximal Policy Optimization
    Sun, Yuewen
    Yuan, Xin
    Liu, Wenzhang
    Sun, Changyin
    [J]. 2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4736 - 4740
  • [25] Model-Free Imitation Learning with Policy Optimization
    Ho, Jonathan
    Gupta, Jayesh K.
    Ermon, Stefano
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [26] Policy Learning with Constraints in Model-free Reinforcement Learning: A Survey
    Liu, Yongshuai
    Halev, Avishai
    Liu, Xin
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 4508 - 4515
  • [27] Model-based decision making and model-free learning
    Drummond, Nicole
    Niv, Yael
    [J]. CURRENT BIOLOGY, 2020, 30 (15) : R860 - R865
  • [28] Constrained model-free reinforcement learning for process optimization
    Pan, Elton
    Petsagkourakis, Panagiotis
    Mowbray, Max
    Zhang, Dongda
    del Rio-Chanona, Ehecatl Antonio
    [J]. COMPUTERS & CHEMICAL ENGINEERING, 2021, 154
  • [29] Model-Free and Model-Based Active Learning for Regression
    O'Neill, Jack
    Delany, Sarah Jane
    MacNamee, Brian
    [J]. ADVANCES IN COMPUTATIONAL INTELLIGENCE SYSTEMS, 2017, 513 : 375 - 386
  • [30] Sim-to-Real Model-Based and Model-Free Deep Reinforcement Learning for Tactile Pushing
    Yang, Max
    Lin, Yijiong
    Church, Alex
    Lloyd, John
    Zhang, Dandan
    Barton, David A. W.
    Lepora, Nathan F.
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (09) : 5480 - 5487