Hierarchical Reinforcement Learning for Concurrent Discovery of Compound and Composable Policies

被引：0

作者：

Esteban, Domingo ^{[1
,2
]}

Rozo, Leonel ^{[3
]}

Caldwell, Darwin G. ^{[1
]}

机构：

[1] Ist Italiano Tecnol, Dept Adv Robot, Via Morego 30, I-16163 Genoa, Italy

[2] Univ Genoa, DIBRIS, Via Opera Pia 13, I-16145 Genoa, Italy

[3] Bosch Ctr Artificial Intelligence, Robert Bosch Campus 1, D-71272 Renningen, Germany

来源：

2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2019年

关键词：

D O I：

10.1109/iros40897.2019.8968149

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

A common strategy to deal with the expensive reinforcement learning (RL) of complex tasks is to decompose them into a collection of subtasks that are usually simpler to learn as well as reusable for new problems. However, when a robot learns the policies for these subtasks, common approaches treat every policy learning process separately. Therefore, all these individual (composable) policies need to be learned before tackling the learning process of the complex task through policies composition. Moreover, such composition of individual policies is usually performed sequentially, which is not suitable for tasks that require to perform the subtasks concurrently. In this paper, we propose to combine a set of composable Gaussian policies corresponding to these subtasks using a set of activation vectors, resulting in a complex Gaussian policy that is a function of the means and covariances matrices of the composable policies. Moreover, we propose an algorithm for learning both compound and composable policies within the same learning process by exploiting the off-policy data generated from the compound policy. The algorithm is built on a maximum entropy RL approach to favor exploration during the learning process. The results of the experiments show that the experience collected with the compound policy permits not only to solve the complex task but also to obtain useful composable policies that successfully perform in their corresponding subtasks.

引用

下载

页码：1818 / 1825

页数：8

共 50 条

[21] FAOD: Fast Automatic Option Discovery in Hierarchical Reinforcement Learning
Koudad, Zoulikha
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2021, 30 (02)
[22] An agent with a sense of direction for option discovery in hierarchical reinforcement learning
Koudad, Zoulikha
Merzoug, Mohamed
Benamar, Abdelkrim
INTERNATIONAL JOURNAL OF MODELING SIMULATION AND SCIENTIFIC COMPUTING, 2024,
[23] Composable Deep Reinforcement Learning for Robotic Manipulation
Haarnoja, Tuomas
Pong, Vitchyr
Zhou, Aurick
Dalal, Murtaza
Abbeel, Pieter
Levine, Sergey
2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 6244 - 6251
[24] A Composable Specification Language for Reinforcement Learning Tasks
Jothimurugan, Kishor
Alur, Rajeev
Bastani, Osbert
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[25] End-to-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery
Pateria, Shubham
Subagdja, Budhitama
Tan, Ah-Hwee
Quek, Chai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (12) : 7778 - 7790
[26] Connect-based subgoal discovery for options in hierarchical reinforcement learning
Chen, Fei
Gao, Yang
Chen, Shifu
Ma, Zhenduo
ICNC 2007: THIRD INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 4, PROCEEDINGS, 2007, : 698 - +
[27] Hypothesis-Driven Skill Discovery for Hierarchical Deep Reinforcement Learning
Chuck, Caleb
Chockchowwat, Supawit
Niekum, Scott
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5572 - 5579
[28] A Task-Agnostic Regularizer for Diverse Subpolicy Discovery in Hierarchical Reinforcement Learning
Huo, Liangyu
Wang, Zulin
Xu, Mai
Song, Yuhang
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (03): : 1932 - 1944
[29] Learning Curriculum Policies for Reinforcement Learning
Narvekar, Sanmit
Stone, Peter
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 25 - 33
[30] Hierarchical reinforcement learning with OMQ
Shen, Jing
Liu, Haibo
Gu, Guochang
PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS, VOLS 1 AND 2, 2006, : 584 - 588

← 1 2 3 4 5 →