Online Sparse Reinforcement Learning

被引:0
|
作者
Hao, Botao [1 ]
Lattimore, Tor [1 ]
Szepesvari, Csaba [1 ,2 ]
Wang, Mengdi [3 ]
机构
[1] Deepmind, London, England
[2] Univ Alberta, Edmonton, AB, Canada
[3] Princeton Univ, Princeton, NJ 08544 USA
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the hardness of online reinforcement learning in fixed horizon, sparse linear Markov decision process (MDP), with a special focus on the high-dimensional regime where the ambient dimension is larger than the number of episodes. Our contribution is two-fold. First, we provide a lower bound showing that linear regret is generally unavoidable in this case, even if there exists a policy that collects well-conditioned data. The lower bound construction uses an MDP with a fixed number of states while the number of actions scales with the ambient dimension. Note that when the horizon is fixed to one, the case of linear stochastic bandits, the linear regret can be avoided. Second, we show that if the learner has oracle access to a policy that collects well-conditioned data then a variant of Lasso fitted Q-iteration enjoys a nearly dimension free regret of (O) over tilde (s(2/3)N(2/3)) where N is the number of episodes and s is the sparsity level. This shows that in the large-action setting, the difficulty of learning can be attributed to the difficulty of finding a good exploratory policy.
引用
收藏
页码:316 / +
页数:10
相关论文
共 50 条
  • [31] Online Learning and Exploiting Relational Models in Reinforcement Learning
    Croonenborghs, Tom
    Ramon, Jan
    Blockeel, Hendrik
    Bruynooghe, Maurice
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 726 - 731
  • [32] Optimization of Learning Cycles in Online Reinforcement Learning Systems
    Notsu, Akira
    Yasuda, Koji
    Ubukata, Seiki
    Honda, Katsuhiro
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 3530 - 3534
  • [34] COMPLETE DICTIONARY ONLINE LEARNING FOR SPARSE UNMIXING
    Feng, Ruyi
    Zhong, Yanfei
    Zhang, Liangpei
    2016 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2016, : 6581 - 6584
  • [35] On using reinforcement learning to solve sparse linear systems
    Kuefler, Erik
    Chen, Tzu-Yi
    COMPUTATIONAL SCIENCE - ICCS 2008, PT 1, 2008, 5101 : 955 - 964
  • [36] Sparse Online Learning via Truncated Gradient
    Langford, John
    Li, Lihong
    Zhang, Tong
    JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 777 - 801
  • [37] Online learning for matrix factorization and sparse coding
    Mairal, Julien
    Bach, Francis
    Ponce, Jean
    Sapiro, Guillermo
    Journal of Machine Learning Research, 2010, 11 : 19 - 60
  • [38] An Online Learning Algorithm with Dual Sparse Mechanisms
    Wei B.
    Wu R.-F.
    Zhang W.-S.
    Lü J.-Q.
    Wang Y.-Y.
    Xia X.-W.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2019, 47 (10): : 2202 - 2210
  • [39] Online Learning for Matrix Factorization and Sparse Coding
    Mairal, Julien
    Bach, Francis
    Ponce, Jean
    Sapiro, Guillermo
    JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 19 - 60
  • [40] Efficient Sparse Attacks on Videos using Reinforcement Learning
    Yan, Huanqian
    Wei, Xingxing
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 2326 - 2334