Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies

被引：0

作者：

Esteban, Domingo ^{[1
,2
]}

Rozo, Leonel ^{[3
]}

Caldwell, Darwin G. ^{[1
]}

机构：

[1] Ist Italiano Tecnol, Dept Adv Robot, Via Morego 30, I-16163 Genoa, Italy

[2] Univ Genoa, DIBRIS, Via Opera Pia 13, I-16145 Genoa, Italy

[3] Bosch Ctr Artificial Intelligence, Robert Bosch Campus 1, D-71272 Renningen, Germany

来源：

2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2019年

关键词：

D O I：

10.1109/iros40897.2019.8968149

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A common strategy to deal with the expensive reinforcement learning (RL) of complex tasks is to decompose them into a collection of subtasks that are usually simpler to learn as well as reusable for new problems. However, when a robot learns the policies for these subtasks, common approaches treat every policy learning process separately. Therefore, all these individual (composable) policies need to be learned before tackling the learning process of the complex task through policies composition. Moreover, such composition of individual policies is usually performed sequentially, which is not suitable for tasks that require to perform the subtasks concurrently. In this paper, we propose to combine a set of composable Gaussian policies corresponding to these subtasks using a set of activation vectors, resulting in a complex Gaussian policy that is a function of the means and covariances matrices of the composable policies. Moreover, we propose an algorithm for learning both compound and composable policies within the same learning process by exploiting the off-policy data generated from the compound policy. The algorithm is built on a maximum entropy RL approach to favor exploration during the learning process. The results of the experiments show that the experience collected with the compound policy permits not only to solve the complex task but also to obtain useful composable policies that successfully perform in their corresponding subtasks.

引用

页码：1818 / 1825

页数：8

共 50 条

[1] Concurrent Hierarchical Reinforcement Learning
Marthi, Bhaskara
Russell, Stuart
Latham, David
Guestrin, Carlos
[J]. 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), 2005, : 779 - 785
[2] Composable Energy Policies for Reactive Motion Generation and Reinforcement Learning
IAS, TU Darmstadt
不详
[J]. Robot. Sci. Syst., 1600,
[3] Composable energy policies for reactive motion generation and reinforcement learning
Urain, Julen
Li, Anqi
Liu, Puze
D'Eramo, Carlo
Peters, Jan
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2023, 42 (10): : 827 - 858
[4] Composable Energy Policies for Reactive Motion Generation and Reinforcement Learning
Urain, Julen
Li, Anqi
Liu, Puze
D'Eramo, Carlo
Peters, Jan
[J]. ROBOTICS: SCIENCE AND SYSTEM XVII, 2021,
[5] MCP: Learning Composable Hierarchical Control with Multiplicative Compositional Policies
Peng, Xue Bin
Chang, Michael
Zhang, Grace
Abbeel, Pieter
Levine, Sergey
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[6] Concurrent Hierarchical Reinforcement Learning for RoboCup Keepaway
Bai, Aijun
Russell, Stuart
Chen, Xiaoping
[J]. ROBOCUP 2017: ROBOT WORLD CUP XXI, 2018, 11175 : 190 - 203
[7] Latent Space Policies for Hierarchical Reinforcement Learning
Haarnoja, Tuomas
Hartikainen, Kristian
Abbeel, Pieter
Levine, Sergey
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[8] Composable Modular Reinforcement Learning
Simpkins, Christopher
Isbell, Charles
[J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4975 - 4982
[9] Autonomic discovery of subgoals in hierarchical reinforcement learning
XIAO Ding
LI Yi-tong
SHI Chuan
[J]. The Journal of China Universities of Posts and Telecommunications, 2014, (05) : 94 - 104
[10] Autonomic discovery of subgoals in hierarchical reinforcement learning
XIAO Ding
LI Yi-tong
SHI Chuan
[J]. TheJournalofChinaUniversitiesofPostsandTelecommunications, 2014, 21 (05) : 94 - 104

← 1 2 3 4 5 →