Online Sparse Reinforcement Learning

被引:0
|
作者
Hao, Botao [1 ]
Lattimore, Tor [1 ]
Szepesvari, Csaba [1 ,2 ]
Wang, Mengdi [3 ]
机构
[1] Deepmind, London, England
[2] Univ Alberta, Edmonton, AB, Canada
[3] Princeton Univ, Princeton, NJ 08544 USA
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the hardness of online reinforcement learning in fixed horizon, sparse linear Markov decision process (MDP), with a special focus on the high-dimensional regime where the ambient dimension is larger than the number of episodes. Our contribution is two-fold. First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data. The lower bound construction uses an MDP with a fixed number of states while the number of actions scales with the ambient dimension. Note that when the horizon is fixed to one, the case of linear stochastic bandits, the linear regret can be avoided. Second, we show that if the learner has oracle access to a policy that collects well-conditioned data then a variant of Lasso fitted Q-iteration enjoys a nearly dimension free regret of (O) over tilde (s(2/3)N(2/3)) where N is the number of episodes and s is the sparsity level. This shows that in the large-action setting, the difficulty of learning can be attributed to the difficulty of finding a good exploratory policy.
引用
收藏
页码:316 / +
页数:10
相关论文
共 50 条
  • [1] Learning Sparse Representations in Reinforcement Learning with Sparse Coding
    Le, Lei
    Kumaraswamy, Raksha
    White, Martha
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2067 - 2073
  • [2] Online Sparse Beamforming in C-RAN: A Deep Reinforcement Learning Approach
    Zhong, Chong-Hao
    Guo, Kun
    Zhao, Mingxiong
    2021 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2021,
  • [3] Stabilized Sparse Online Learning for Sparse Data
    Ma, Yuting
    Zheng, Tian
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [4] Online learning with sparse labels
    He, Wenwu
    Zou, Fumin
    Liang, Quan
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (23):
  • [5] Intermittent Reinforcement Learning with Sparse Rewards
    Sahoo, Prachi Pratyusha
    Vamvoudakis, Kyriakos G.
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 2709 - 2714
  • [6] Sparse online maximum entropy inverse reinforcement learning via proximal optimization and truncated gradient
    Song L.
    Li D.
    Xu X.
    Knowledge-Based Systems, 2022, 252
  • [7] Online testing with reinforcement learning
    Veanes, Margus
    Roy, Pritam
    Campbell, Colin
    FORMAL APPROACHES TO SOFTWARE TESTING AND RUNTIME VERIFICATION, 2006, 4262 : 240 - +
  • [8] Online shielding for reinforcement learning
    Koenighofer, Bettina
    Rudolf, Julian
    Palmisano, Alexander
    Tappler, Martin
    Bloem, Roderick
    INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2023, 19 (04) : 379 - 394
  • [9] Online shielding for reinforcement learning
    Bettina Könighofer
    Julian Rudolf
    Alexander Palmisano
    Martin Tappler
    Roderick Bloem
    Innovations in Systems and Software Engineering, 2023, 19 : 379 - 394
  • [10] Learning to Drive Using Sparse Imitation Reinforcement Learning
    Han, Yuci
    Yilmaz, Alper
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 3736 - 3742