Reinforcement Learning in Latent Action Sequence Space

被引:3
|
作者
Kim, Heecheol [1 ]
Yamada, Masanori [2 ]
Miyoshi, Kosuke [3 ]
Iwata, Tomoharu [4 ]
Yamakawa, Hiroshi [5 ]
机构
[1] Univ Tokyo, Grad Sch Informat Sci & Technol, Lab Intelligent Syst & Informat, Tokyo, Japan
[2] Nippon Telegraph & Tel Corp, Secure Platform Labs, Tokyo, Japan
[3] Narrat Nights Inc, Yokohama, Kanagawa, Japan
[4] Nippon Telegraph & Tel Corp, Commun Sci Labs, Tokyo, Japan
[5] Dwango Artificial Intelligence Lab, Tokyo, Japan
关键词
Reinforcement Learning; Transfer Learning; Learning from Demonstration;
D O I
10.1109/IROS45743.2020.9341629
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
One problem in real-world applications of reinforcement learning is the high dimensionality of the action search spaces, which comes from the combination of actions over time. To reduce the dimensionality of action sequence search spaces, macro actions have been studied, which are sequences of primitive actions to solve tasks. However, previous studies relied on humans to define macro actions or assumed macro actions to be repetitions of the same primitive actions. We propose encoded action sequence reinforcement learning (EASRL), a reinforcement learning method that learns flexible sequences of actions in a latent space for a high-dimensional action sequence search space. With EASRL, encoder and decoder networks are trained with demonstration data by using variational autoencoders for mapping macro actions into the latent space. Then, we learn a policy network in the latent space, which is a distribution over encoded macro actions given a state. By learning in the latent space, we can reduce the dimensionality of the action sequence search space and handle various patterns of action sequences. We experimentally demonstrate that the proposed method outperforms other reinforcement learning methods on tasks that require an extensive amount of search.
引用
收藏
页码:5497 / 5503
页数:7
相关论文
共 50 条
  • [1] LASER: Learning a Latent Action Space for Efficient Reinforcement Learning
    Allshire, Arthur
    Martin-Martin, Roberto
    Lin, Charles
    Manuel, Shawn
    Savarese, Silvio
    Garg, Animesh
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 6650 - 6656
  • [2] Latent Space Policies for Hierarchical Reinforcement Learning
    Haarnoja, Tuomas
    Hartikainen, Kristian
    Abbeel, Pieter
    Levine, Sergey
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [3] Switching reinforcement learning for continuous action space
    Nagayoshi, Masato
    Murao, Hajime
    Tamaki, Hisashi
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2012, 95 (03) : 37 - 44
  • [4] Action Space Shaping in Deep Reinforcement Learning
    Kanervisto, Anssi
    Scheller, Christian
    Hautamaki, Ville
    [J]. 2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 479 - 486
  • [5] Couple Particles in Action Space for Reinforcement Learning
    Notsu, Akira
    Honda, Katsuhiro
    Ichihashi, Hidetomo
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2010, 10 (12): : 200 - 203
  • [6] Linear Reinforcement Learning with Ball Structure Action Space
    Jia, Zeyu
    Jia, Randy
    Madeka, Dhruv
    Foster, Dean P.
    [J]. INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 201, 2023, 201 : 755 - 775
  • [7] Hierarchical Advantage for Reinforcement Learning in Parameterized Action Space
    Hu, Zhejie
    Kaneko, Tomoyuki
    [J]. 2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 816 - 823
  • [8] A reinforcement learning with switching controllers for a continuous action space
    Nagayoshi, Masato
    Murao, Hajime
    Tamaki, Hisashi
    [J]. ARTIFICIAL LIFE AND ROBOTICS, 2010, 15 (01) : 97 - 100
  • [9] Deep Reinforcement Learning with a Natural Language Action Space
    He, Ji
    Chen, Jianshu
    He, Xiaodong
    Gao, Jianfeng
    Li, Lihong
    Deng, Li
    Ostendorf, Mari
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1621 - 1630
  • [10] Reinforcement learning algorithm with CTRNN in continuous action space
    Arie, Hiroaki
    Namikawa, Jun
    Ogata, Tetsuya
    Tani, Jun
    Sugano, Shigeki
    [J]. NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2006, 4232 : 387 - 396