Reinforcement Learning in Latent Action Sequence Space

被引：3

作者：

Kim, Heecheol ^{[1
]}

Yamada, Masanori ^{[2
]}

Miyoshi, Kosuke ^{[3
]}

Iwata, Tomoharu ^{[4
]}

Yamakawa, Hiroshi ^{[5
]}

机构：

[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Lab Intelligent Syst & Informat, Tokyo, Japan

[2] Nippon Telegraph & Tel Corp, Secure Platform Labs, Tokyo, Japan

[3] Narrat Nights Inc, Yokohama, Kanagawa, Japan

[4] Nippon Telegraph & Tel Corp, Commun Sci Labs, Tokyo, Japan

[5] Dwango Artificial Intelligence Lab, Tokyo, Japan

来源：

2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2020年

关键词：

Reinforcement Learning; Transfer Learning; Learning from Demonstration;

D O I：

10.1109/IROS45743.2020.9341629

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

One problem in real-world applications of reinforcement learning is the high dimensionality of the action search spaces, which comes from the combination of actions over time. To reduce the dimensionality of action sequence search spaces, macro actions have been studied, which are sequences of primitive actions to solve tasks. However, previous studies relied on humans to define macro actions or assumed macro actions to be repetitions of the same primitive actions. We propose encoded action sequence reinforcement learning (EASRL), a reinforcement learning method that learns flexible sequences of actions in a latent space for a high-dimensional action sequence search space. With EASRL, encoder and decoder networks are trained with demonstration data by using variational autoencoders for mapping macro actions into the latent space. Then, we learn a policy network in the latent space, which is a distribution over encoded macro actions given a state. By learning in the latent space, we can reduce the dimensionality of the action sequence search space and handle various patterns of action sequences. We experimentally demonstrate that the proposed method outperforms other reinforcement learning methods on tasks that require an extensive amount of search.

引用

页码：5497 / 5503

页数：7

共 50 条

[1] LASER: Learning a Latent Action Space for Efficient Reinforcement Learning
Allshire, Arthur
Martin-Martin, Roberto
Lin, Charles
Manuel, Shawn
Savarese, Silvio
Garg, Animesh
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 6650 - 6656
[2] Latent Space Policies for Hierarchical Reinforcement Learning
Haarnoja, Tuomas
Hartikainen, Kristian
Abbeel, Pieter
Levine, Sergey
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[3] Switching reinforcement learning for continuous action space
Nagayoshi, Masato
Murao, Hajime
Tamaki, Hisashi
[J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2012, 95 (03) : 37 - 44
[4] Action Space Shaping in Deep Reinforcement Learning
Kanervisto, Anssi
Scheller, Christian
Hautamaki, Ville
[J]. 2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 479 - 486
[5] Couple Particles in Action Space for Reinforcement Learning
Notsu, Akira
Honda, Katsuhiro
Ichihashi, Hidetomo
[J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2010, 10 (12): : 200 - 203
[6] Linear Reinforcement Learning with Ball Structure Action Space
Jia, Zeyu
Jia, Randy
Madeka, Dhruv
Foster, Dean P.
[J]. INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 755 - 775
[7] Hierarchical Advantage for Reinforcement Learning in Parameterized Action Space
Hu, Zhejie
Kaneko, Tomoyuki
[J]. 2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 816 - 823
[8] A reinforcement learning with switching controllers for a continuous action space
Nagayoshi, Masato
Murao, Hajime
Tamaki, Hisashi
[J]. ARTIFICIAL LIFE AND ROBOTICS, 2010, 15 (01) : 97 - 100
[9] Deep Reinforcement Learning with a Natural Language Action Space
He, Ji
Chen, Jianshu
He, Xiaodong
Gao, Jianfeng
Li, Lihong
Deng, Li
Ostendorf, Mari
[J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1621 - 1630
[10] Reinforcement learning algorithm with CTRNN in continuous action space
Arie, Hiroaki
Namikawa, Jun
Ogata, Tetsuya
Tani, Jun
Sugano, Shigeki
[J]. NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2006, 4232 : 387 - 396

← 1 2 3 4 5 →