Guided Cooperation in Hierarchical Reinforcement Learning via Model-Based Rollout

被引：0

作者：

Wang, Haoran ^{[1
]}

Tang, Zeshen ^{[1
]}

Sun, Yaoru ^{[1
]}

Wang, Fang ^{[2
]}

Zhang, Siyu ^{[1
]}

Chen, Yeming ^{[1
]}

机构：

[1] Tongji Univ, Coll Elect & Informat Engn, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China

[2] Brunel Univ London, Dept Comp Sci, Uxbridge UB8 3PH, England

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年

关键词：

Planning; Task analysis; Reinforcement learning; Robustness; Sun; Learning systems; Vehicle dynamics; Deep reinforcement learning (DRL); goal conditioning; hierarchical reinforcement learning (HRL); interlevel cooperation; model-based rollout;

D O I：

10.1109/TNNLS.2024.3425809

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Goal-conditioned hierarchical reinforcement learning (HRL) presents a promising approach for enabling effective exploration in complex, long-horizon reinforcement learning (RL) tasks through temporal abstraction. Empirically, heightened interlevel communication and coordination can induce more stable and robust policy improvement in hierarchical systems. Yet, most existing goal-conditioned HRL algorithms have primarily focused on the subgoal discovery, neglecting interlevel cooperation. Here, we propose a novel goal-conditioned HRL framework named Guided Cooperation via Model-Based Rollout (GCMR; code is available at https://github.com/HaoranWang-TJ/GCMR_ACLG_official), aiming to bridge interlayer information synchronization and cooperation by exploiting forward dynamics. First, the GCMR mitigates the state-transition error within off-policy correction via model-based rollout, thereby enhancing sample efficiency. Second, to prevent disruption by the unseen subgoals and states, lower level Q -function gradients are constrained using a gradient penalty with a model-inferred upper bound, leading to a more stable behavioral policy conducive to effective exploration. Third, we propose a one-step rollout-based planning, using higher level critics to guide the lower level policy. Specifically, we estimate the value of future states of the lower level policy using the higher level critic function, thereby transmitting global task information downward to avoid local pitfalls. These three critical components in GCMR are expected to facilitate interlevel cooperation significantly. Experimental results demonstrate that incorporating the proposed GCMR framework with a disentangled variant of hierarchical reinforcement learning guided by landmarks (HIGL), namely, adjacency constraint and landmark-guided planning (ACLG), yields more stable and robust policy improvement compared with various baselines and significantly outperforms previous state-of-the-art (SOTA) algorithms.

引用

下载

页数：15

共 50 条

[1] Multiphase Autonomous Docking via Model-Based and Hierarchical Reinforcement Learning
Aborizk, Anthony
Fitz-Coy, Norman
JOURNAL OF SPACECRAFT AND ROCKETS, 2024, 61 (04) : 993 - 1005
[2] An Efficient Approach to Model-Based Hierarchical Reinforcement Learning
Li, Zhuoru
Narayan, Akshay
Leong, Tze-Yun
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3583 - 3589
[3] Model-based hierarchical reinforcement learning and human action control
Botvinick, Matthew
Weinstein, Ari
PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY B-BIOLOGICAL SCIENCES, 2014, 369 (1655)
[4] Model-Based Reinforcement Learning via Imagination with Derived Memory
Mu, Yao
Zhuang, Yuzheng
Wang, Bin
Zhu, Guangxiang
Liu, Wulong
Chen, Jianyu
Luo, Ping
Li, Shengbo Eben
Zhang, Chongjie
Hao, Jianye
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[5] Model-Based Reinforcement Learning via Proximal Policy Optimization
Sun, Yuewen
Yuan, Xin
Liu, Wenzhang
Sun, Changyin
2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4736 - 4740
[6] Model-Based Probabilistic Pursuit via Inverse Reinforcement Learning
Shkurti, Florian
Kakodkar, Nikhil
Dudek, Gregory
2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 7804 - 7811
[7] The ubiquity of model-based reinforcement learning
Doll, Bradley B.
Simon, Dylan A.
Daw, Nathaniel D.
CURRENT OPINION IN NEUROBIOLOGY, 2012, 22 (06) : 1075 - 1081
[8] Model-based Reinforcement Learning: A Survey
Moerland, Thomas M.
Broekens, Joost
Plaat, Aske
Jonker, Catholijn M.
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2023, 16 (01): : 1 - 118
[9] A survey on model-based reinforcement learning
Fan-Ming LUO
Tian XU
Hang LAI
Xiong-Hui CHEN
Weinan ZHANG
Yang YU
Science China(Information Sciences), 2024, 67 (02) : 59 - 84
[10] Nonparametric model-based reinforcement learning
Atkeson, CG
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 10, 1998, 10 : 1008 - 1014

← 1 2 3 4 5 →