Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies

被引:0
|
作者
Esteban, Domingo [1 ,2 ]
Rozo, Leonel [3 ]
Caldwell, Darwin G. [1 ]
机构
[1] Ist Italiano Tecnol, Dept Adv Robot, Via Morego 30, I-16163 Genoa, Italy
[2] Univ Genoa, DIBRIS, Via Opera Pia 13, I-16145 Genoa, Italy
[3] Bosch Ctr Artificial Intelligence, Robert Bosch Campus 1, D-71272 Renningen, Germany
关键词
D O I
10.1109/iros40897.2019.8968149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A common strategy to deal with the expensive reinforcement learning (RL) of complex tasks is to decompose them into a collection of subtasks that are usually simpler to learn as well as reusable for new problems. However, when a robot learns the policies for these subtasks, common approaches treat every policy learning process separately. Therefore, all these individual (composable) policies need to be learned before tackling the learning process of the complex task through policies composition. Moreover, such composition of individual policies is usually performed sequentially, which is not suitable for tasks that require to perform the subtasks concurrently. In this paper, we propose to combine a set of composable Gaussian policies corresponding to these subtasks using a set of activation vectors, resulting in a complex Gaussian policy that is a function of the means and covariances matrices of the composable policies. Moreover, we propose an algorithm for learning both compound and composable policies within the same learning process by exploiting the off-policy data generated from the compound policy. The algorithm is built on a maximum entropy RL approach to favor exploration during the learning process. The results of the experiments show that the experience collected with the compound policy permits not only to solve the complex task but also to obtain useful composable policies that successfully perform in their corresponding subtasks.
引用
收藏
页码:1818 / 1825
页数:8
相关论文
共 50 条
  • [1] Concurrent Hierarchical Reinforcement Learning
    Marthi, Bhaskara
    Russell, Stuart
    Latham, David
    Guestrin, Carlos
    [J]. 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 779 - 785
  • [2] Composable Energy Policies for Reactive Motion Generation and Reinforcement Learning
    IAS, TU Darmstadt
    不详
    [J]. Robot. Sci. Syst., 1600,
  • [3] Composable energy policies for reactive motion generation and reinforcement learning
    Urain, Julen
    Li, Anqi
    Liu, Puze
    D'Eramo, Carlo
    Peters, Jan
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2023, 42 (10): : 827 - 858
  • [4] Composable Energy Policies for Reactive Motion Generation and Reinforcement Learning
    Urain, Julen
    Li, Anqi
    Liu, Puze
    D'Eramo, Carlo
    Peters, Jan
    [J]. ROBOTICS: SCIENCE AND SYSTEM XVII, 2021,
  • [5] MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies
    Peng, Xue Bin
    Chang, Michael
    Zhang, Grace
    Abbeel, Pieter
    Levine, Sergey
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Concurrent Hierarchical Reinforcement Learning for RoboCup Keepaway
    Bai, Aijun
    Russell, Stuart
    Chen, Xiaoping
    [J]. ROBOCUP 2017: ROBOT WORLD CUP XXI, 2018, 11175 : 190 - 203
  • [7] Latent Space Policies for Hierarchical Reinforcement Learning
    Haarnoja, Tuomas
    Hartikainen, Kristian
    Abbeel, Pieter
    Levine, Sergey
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [8] Composable Modular Reinforcement Learning
    Simpkins, Christopher
    Isbell, Charles
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4975 - 4982
  • [9] Autonomic discovery of subgoals in hierarchical reinforcement learning
    XIAO Ding
    LI Yi-tong
    SHI Chuan
    [J]. The Journal of China Universities of Posts and Telecommunications, 2014, (05) : 94 - 104
  • [10] Autonomic discovery of subgoals in hierarchical reinforcement learning
    XIAO Ding
    LI Yi-tong
    SHI Chuan
    [J]. TheJournalofChinaUniversitiesofPostsandTelecommunications, 2014, 21 (05) : 94 - 104