Latent Space Policies for Hierarchical Reinforcement Learning

被引:0
|
作者
Haarnoja, Tuomas [1 ]
Hartikainen, Kristian
Abbeel, Pieter [1 ]
Levine, Sergey [1 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the problem of learning hierarchical deep neural network policies for reinforcement learning. In contrast to methods that explicitly restrict or cripple lower layers of a hierarchy to force them to use higher-level modulating signals, each layer in our framework is trained to directly solve the task, but acquires a range of diverse strategies via a maximum entropy reinforcement learning objective. Each layer is also augmented with latent random variables, which are sampled from a prior distribution during the training of that layer. The maximum entropy objective causes these latent variables to be incorporated into the layer's policy, and the higher level layer can directly control the behavior of the lower layer through this latent space. Furthermore, by constraining the mapping from latent variables to actions to be invertible, higher layers retain full expressivity: neither the higher layers nor the lower layers are constrained in their behavior. Our experimental evaluation demonstrates that we can improve on the performance of single-layer policies on standard benchmark tasks simply by adding additional layers, and that our method can solve more complex sparse-reward tasks by learning higher-level policies on top of high-entropy skills optimized for simple low-level objectives.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Reinforcement Learning in Latent Action Sequence Space
    Kim, Heecheol
    Yamada, Masanori
    Miyoshi, Kosuke
    Iwata, Tomoharu
    Yamakawa, Hiroshi
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5497 - 5503
  • [2] Experiments with Hierarchical Reinforcement Learning of Multiple Grasping Policies
    Osa, Takayuki
    Peters, Jan
    Neumann, Gerhard
    [J]. 2016 INTERNATIONAL SYMPOSIUM ON EXPERIMENTAL ROBOTICS, 2017, 1 : 160 - 172
  • [3] Screening goals and selecting policies in hierarchical reinforcement learning
    Zhou, Junyan
    Chen, Jing
    Tong, Yanfeng
    Zhang, Junrui
    [J]. APPLIED INTELLIGENCE, 2022, 52 (15) : 18049 - 18060
  • [4] Safe Offline Reinforcement Learning Through Hierarchical Policies
    Liu, Shaofan
    Sun, Shiliang
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT II, 2022, 13281 : 380 - 391
  • [5] Screening goals and selecting policies in hierarchical reinforcement learning
    Junyan Zhou
    Jing Chen
    Yanfeng Tong
    Junrui Zhang
    [J]. Applied Intelligence, 2022, 52 : 18049 - 18060
  • [6] LASER: Learning a Latent Action Space for Efficient Reinforcement Learning
    Allshire, Arthur
    Martin-Martin, Roberto
    Lin, Charles
    Manuel, Shawn
    Savarese, Silvio
    Garg, Animesh
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 6650 - 6656
  • [7] Deep Belief Network for Modeling Hierarchical Reinforcement Learning Policies
    Djurdjevic, Predrag D.
    Huber, Manfred
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 2485 - 2491
  • [8] Planning in Hierarchical Reinforcement Learning: Guarantees for Using Local Policies
    Zahavy, Tom
    Hasidim, Avinatan
    Kaplan, Haim
    Mansour, Yishay
    [J]. ALGORITHMIC LEARNING THEORY, VOL 117, 2020, 117 : 906 - 934
  • [9] Hierarchical Reinforcement Learning With Universal Policies for Multistep Robotic Manipulation
    Yang, Xintong
    Ji, Ze
    Wu, Jing
    Lai, Yu-Kun
    Wei, Changyun
    Liu, Guoliang
    Setchi, Rossitza
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 4727 - 4741
  • [10] Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies
    Esteban, Domingo
    Rozo, Leonel
    Caldwell, Darwin G.
    [J]. 2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 1818 - 1825