Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies

被引:0
|
作者
Esteban, Domingo [1 ,2 ]
Rozo, Leonel [3 ]
Caldwell, Darwin G. [1 ]
机构
[1] Ist Italiano Tecnol, Dept Adv Robot, Via Morego 30, I-16163 Genoa, Italy
[2] Univ Genoa, DIBRIS, Via Opera Pia 13, I-16145 Genoa, Italy
[3] Bosch Ctr Artificial Intelligence, Robert Bosch Campus 1, D-71272 Renningen, Germany
关键词
D O I
10.1109/iros40897.2019.8968149
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A common strategy to deal with the expensive reinforcement learning (RL) of complex tasks is to decompose them into a collection of subtasks that are usually simpler to learn as well as reusable for new problems. However, when a robot learns the policies for these subtasks, common approaches treat every policy learning process separately. Therefore, all these individual (composable) policies need to be learned before tackling the learning process of the complex task through policies composition. Moreover, such composition of individual policies is usually performed sequentially, which is not suitable for tasks that require to perform the subtasks concurrently. In this paper, we propose to combine a set of composable Gaussian policies corresponding to these subtasks using a set of activation vectors, resulting in a complex Gaussian policy that is a function of the means and covariances matrices of the composable policies. Moreover, we propose an algorithm for learning both compound and composable policies within the same learning process by exploiting the off-policy data generated from the compound policy. The algorithm is built on a maximum entropy RL approach to favor exploration during the learning process. The results of the experiments show that the experience collected with the compound policy permits not only to solve the complex task but also to obtain useful composable policies that successfully perform in their corresponding subtasks.
引用
收藏
页码:1818 / 1825
页数:8
相关论文
共 50 条
  • [31] Hierarchical Imitation and Reinforcement Learning
    Le, Hoang M.
    Jiang, Nan
    Agarwal, Alekh
    Dudik, Miroslav
    Yue, Yisong
    Daume, Hal, III
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [32] On Efficiency in Hierarchical Reinforcement Learning
    Wen, Zheng
    Precup, Doina
    Ibrahimi, Morteza
    Barreto, Andre
    Van Roy, Benjamin
    Singh, Satinder
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [33] Budgeted Hierarchical Reinforcement Learning
    Leon, Aurelia
    Denoyer, Ludovic
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [34] CAUSAL DISCOVERY WITH REINFORCEMENT LEARNING
    Huawei Noah's Ark Lab
    不详
    Int. Conf. Learn. Represent., ICLR,
  • [35] Coordinated Exploration in Concurrent Reinforcement Learning
    Dimakopoulou, Maria
    Van Roy, Benjamin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [36] On-policy concurrent reinforcement learning
    Banerjee, B
    Sen, S
    Peng, J
    JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2004, 16 (04) : 245 - 260
  • [37] Cascade Attribute Network: Decomposing Reinforcement Learning Control Policies using Hierarchical Neural Networks
    Chang, Haonan
    Xu, Zhuo
    Tomizuka, Masayoshi
    IFAC PAPERSONLINE, 2020, 53 (02): : 8181 - 8186
  • [38] COMBINATIONS OF MICRO-MACRO STATES AND SUBGOALS DISCOVERY IN HIERARCHICAL REINFORCEMENT LEARNING FOR PATH FINDING
    Setyawan, Gembong Edhi
    Sawada, Hideyuki
    Hartono, Pitoyo
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2022, 18 (02): : 447 - 462
  • [39] Parametrized Quantum Policies for Reinforcement Learning
    Jerbi, Sofiene
    Gyurik, Casper
    Marshall, Simon C.
    Briegel, Hans J.
    Dunjko, Vedran
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [40] Search for Robust Policies in Reinforcement Learning
    Li, Qi
    ICAART: PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2020, : 421 - 428