Online Sparse Reinforcement Learning

被引:0
|
作者
Hao, Botao [1 ]
Lattimore, Tor [1 ]
Szepesvari, Csaba [1 ,2 ]
Wang, Mengdi [3 ]
机构
[1] Deepmind, London, England
[2] Univ Alberta, Edmonton, AB, Canada
[3] Princeton Univ, Princeton, NJ 08544 USA
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the hardness of online reinforcement learning in fixed horizon, sparse linear Markov decision process (MDP), with a special focus on the high-dimensional regime where the ambient dimension is larger than the number of episodes. Our contribution is two-fold. First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data. The lower bound construction uses an MDP with a fixed number of states while the number of actions scales with the ambient dimension. Note that when the horizon is fixed to one, the case of linear stochastic bandits, the linear regret can be avoided. Second, we show that if the learner has oracle access to a policy that collects well-conditioned data then a variant of Lasso fitted Q-iteration enjoys a nearly dimension free regret of (O) over tilde (s(2/3)N(2/3)) where N is the number of episodes and s is the sparsity level. This shows that in the large-action setting, the difficulty of learning can be attributed to the difficulty of finding a good exploratory policy.
引用
收藏
页码:316 / +
页数:10
相关论文
共 50 条
  • [41] A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning
    Li, Xiang
    Yang, Wenhao
    Zhang, Zhihua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [42] Bayesian Reinforcement Learning via Deep, Sparse Sampling
    Grover, Divya
    Basu, Debabrota
    Dimitrakakis, Christos
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 3036 - 3044
  • [43] Sparse Proximal Reinforcement Learning via Nested Optimization
    Song, Tianheng
    Li, Dazi
    Jin, Qibing
    Hirasawa, Kotaro
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11): : 4020 - 4032
  • [44] Deep sparse representation via deep dictionary learning for reinforcement learning
    Tang, Jianhao
    Li, Zhenni
    Xie, Shengli
    Ding, Shuxue
    Zheng, Shaolong
    Chen, Xueni
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 2398 - 2403
  • [45] Quasi-online reinforcement learning for robots
    Bakker, Bram
    Zhumatiy, Viktor
    Gruener, Gabriel
    Schmidhuber, Juergen
    2006 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), VOLS 1-10, 2006, : 2997 - +
  • [46] Reducing reinforcement learning to KWIK online regression
    Li, Lihong
    Littman, Michael L.
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2010, 58 (3-4) : 217 - 237
  • [47] Online inverse reinforcement learning with limited data
    Self, Ryan
    Mahmud, S. M. Nahid
    Hareland, Katrine
    Kamalapurkar, Rushikesh
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 603 - 608
  • [48] Efficient Online Reinforcement Learning with Offline Data
    Ball, Philip J.
    Smith, Laura
    Kostrikov, Ilya
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [49] Reinforcement learning for online control of evolutionary algorithms
    Eiben, A. E.
    Horvath, Mark
    Kowalczyk, Wojtek
    Schut, Martijn C.
    ENGINEERING SELF-ORGANISING SYSTEMS, 2007, 4335 : 151 - +
  • [50] Online Reinforcement Learning for Autonomous Sensor Control
    Ravier, Robert
    Garagic, Denis
    Peskoe, Jacob
    Galoppo, Travis
    Tigue, James
    Rhodes, Bradley J.
    Zulch, Peter
    2023 IEEE AEROSPACE CONFERENCE, 2023,