Linear Reinforcement Learning with Ball Structure Action Space

被引:0
|
作者
Jia, Zeyu [1 ,2 ]
Jia, Randy [2 ]
Madeka, Dhruv [2 ]
Foster, Dean P. [2 ]
机构
[1] MIT, Cambridge, MA 02139 USA
[2] Amazon, Seattle, WA USA
关键词
Markov Decision Process; Reinforcement Learning;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the problem of Reinforcement Learning (RL) with linear function approximation, i.e. assuming the optimal action-value function is linear in a known d-dimensional feature mapping. Unfortunately, however, based on only this assumption, the worst case sample complexity has been shown to be exponential, even under a generative model. Instead of making further assumptions on the MDP or value functions, we assume that our action space is such that there always exist playable actions to explore any direction of the feature space. We formalize this assumption as a "ball structure" action space, and show that being able to freely explore the feature space allows for efficient RL. In particular, we propose a sample-efficient RL algorithm (BallRL) that learns an epsilon-optimal policy using only (O) over tilde (H(5)d(3)/epsilon(3)) number of trajectories.
引用
收藏
页码:755 / 775
页数:21
相关论文
共 50 条
  • [1] Switching reinforcement learning for continuous action space
    Nagayoshi, Masato
    Murao, Hajime
    Tamaki, Hisashi
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2012, 95 (03) : 37 - 44
  • [2] Action Space Shaping in Deep Reinforcement Learning
    Kanervisto, Anssi
    Scheller, Christian
    Hautamaki, Ville
    [J]. 2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 479 - 486
  • [3] Reinforcement Learning in Latent Action Sequence Space
    Kim, Heecheol
    Yamada, Masanori
    Miyoshi, Kosuke
    Iwata, Tomoharu
    Yamakawa, Hiroshi
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5497 - 5503
  • [4] Couple Particles in Action Space for Reinforcement Learning
    Notsu, Akira
    Honda, Katsuhiro
    Ichihashi, Hidetomo
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2010, 10 (12): : 200 - 203
  • [5] LASER: Learning a Latent Action Space for Efficient Reinforcement Learning
    Allshire, Arthur
    Martin-Martin, Roberto
    Lin, Charles
    Manuel, Shawn
    Savarese, Silvio
    Garg, Animesh
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 6650 - 6656
  • [6] Hierarchical Advantage for Reinforcement Learning in Parameterized Action Space
    Hu, Zhejie
    Kaneko, Tomoyuki
    [J]. 2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 816 - 823
  • [7] DEEP REINFORCEMENT LEARNING IN LINEAR DISCRETE ACTION SPACES
    van Heeswijk, Wouter
    La Poutre, Han
    [J]. 2020 WINTER SIMULATION CONFERENCE (WSC), 2020, : 1063 - 1074
  • [8] A reinforcement learning with switching controllers for a continuous action space
    Nagayoshi, Masato
    Murao, Hajime
    Tamaki, Hisashi
    [J]. ARTIFICIAL LIFE AND ROBOTICS, 2010, 15 (01) : 97 - 100
  • [9] Reinforcement learning algorithm with CTRNN in continuous action space
    Arie, Hiroaki
    Namikawa, Jun
    Ogata, Tetsuya
    Tani, Jun
    Sugano, Shigeki
    [J]. NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2006, 4232 : 387 - 396
  • [10] Deep Reinforcement Learning with a Natural Language Action Space
    He, Ji
    Chen, Jianshu
    He, Xiaodong
    Gao, Jianfeng
    Li, Lihong
    Deng, Li
    Ostendorf, Mari
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 1621 - 1630