Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout

被引:0
|
作者
Wang, Haoran [1 ]
Tang, Zeshen [1 ]
Sun, Yaoru [1 ]
Wang, Fang [2 ]
Zhang, Siyu [1 ]
Chen, Yeming [1 ]
机构
[1] Tongji Univ, Coll Elect & Informat Engn, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
[2] Brunel Univ London, Dept Comp Sci, Uxbridge UB8 3PH, England
关键词
Planning; Task analysis; Reinforcement learning; Robustness; Sun; Learning systems; Vehicle dynamics; Deep reinforcement learning (DRL); goal conditioning; hierarchical reinforcement learning (HRL); interlevel cooperation; model-based rollout;
D O I
10.1109/TNNLS.2024.3425809
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened interlevel communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting interlevel cooperation. Here, we propose a novel goal-conditioned HRL framework named Guided Cooperation via Model-Based Rollout (GCMR; code is available at https://github.com/HaoranWang-TJ/GCMR_ACLG_official), aiming to bridge interlayer information synchronization and cooperation by exploiting forward dynamics. First, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Second, to prevent disruption by the unseen subgoals and states, lower level Q -function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Third, we propose a one-step rollout-based planning, using higher level critics to guide the lower level policy. Specifically, we estimate the value of future states of the lower level policy using the higher level critic function, thereby transmitting global task information downward to avoid local pitfalls. These three critical components in GCMR are expected to facilitate interlevel cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of hierarchical reinforcement learning guided by landmarks (HIGL), namely, adjacency constraint and landmark-guided planning (ACLG), yields more stable and robust policy improvement compared with various baselines and significantly outperforms previous state-of-the-art (SOTA) algorithms.
引用
下载
收藏
页数:15
相关论文
共 50 条
  • [41] Asynchronous Methods for Model-Based Reinforcement Learning
    Zhang, Yunzhi
    Clavera, Ignasi
    Tsai, Boren
    Abbeel, Pieter
    CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [42] Online Constrained Model-based Reinforcement Learning
    van Niekerk, Benjamin
    Damianou, Andreas
    Rosman, Benjamin
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
  • [43] Calibrated Model-Based Deep Reinforcement Learning
    Malik, Ali
    Kuleshov, Volodymyr
    Song, Jiaming
    Nemer, Danny
    Seymour, Harlan
    Ermon, Stefano
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [44] Learning to see via epiretinal implant stimulation in silico with model-based deep reinforcement learning
    Lavoie, Jacob
    Besrour, Marwan
    Lemaire, William
    Rouat, Jean
    Fontaine, Rejean
    Plourde, Eric
    BIOMEDICAL PHYSICS & ENGINEERING EXPRESS, 2024, 10 (02)
  • [45] Model gradient: unified model and policy learning in model-based reinforcement learning
    Jia, Chengxing
    Zhang, Fuxiang
    Xu, Tian
    Pang, Jing-Cheng
    Zhang, Zongzhang
    Yu, Yang
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (04)
  • [46] Model gradient: unified model and policy learning in model-based reinforcement learning
    Chengxing Jia
    Fuxiang Zhang
    Tian Xu
    Jing-Cheng Pang
    Zongzhang Zhang
    Yang Yu
    Frontiers of Computer Science, 2024, 18
  • [47] Olfactory-Based Navigation via Model-Based Reinforcement Learning and Fuzzy Inference Methods
    Wang, Lingxiao
    Pang, Shuo
    Li, Jinlong
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2021, 29 (10) : 3014 - 3027
  • [48] Incremental Learning of Planning Actions in Model-Based Reinforcement Learning
    Ng, Jun Hao Alvin
    Petrick, Ronald P. A.
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3195 - 3201
  • [49] Learning to Reweight Imaginary Transitions for Model-Based Reinforcement Learning
    Huang, Wenzhen
    Yin, Qiyue
    Zhang, Junge
    Huang, Kaiqi
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7848 - 7856
  • [50] Model-Based Transfer Reinforcement Learning Based on Graphical Model Representations
    Sun, Yuewen
    Zhang, Kun
    Sun, Changyin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1035 - 1048