Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

被引:0
|
作者
Domingues, Omar Darwiche [1 ]
Menard, Pierre [2 ]
Kaufmann, Emilie [1 ,3 ,4 ]
Valko, Michal [1 ,5 ]
机构
[1] Inria Lille, Lille, France
[2] Otto von Guericke Univ, Magdeburg, Germany
[3] CNRS, Paris, France
[4] ULille, CRIStAL, Lille, France
[5] DeepMind Paris, Paris, France
来源
关键词
reinforcement learning; episodic; lower bounds;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a lower bound of Omega((H(3)SA/epsilon(2)) log(1/delta)) on the sample complexity of an (epsilon, delta)-PAC algorithm for best policy identification in a non-stationary MDP, relying on a construction of "hard MDPs" which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the Omega(root H(3)SAT) regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
    Jin, Tiancheng
    Luo, Haipeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [42] Near-optimal Reinforcement Learning in Factored MDPs
    Osband, Ian
    Van Roy, Benjamin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [43] Reinforcement learning for MDPs using temporal difference schemes
    Thomas, A
    Marcus, SI
    PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 1997, : 577 - 583
  • [44] Exploiting Additive Structure in Factored MDPs for Reinforcement Learning
    Degris, Thomas
    Sigaud, Olivier
    Wuillemin, Pierre-Henri
    RECENT ADVANCES IN REINFORCEMENT LEARNING, 2008, 5323 : 15 - 26
  • [45] Minimax Lower Bounds via f-divergences
    Guntuboyina, Adityanand
    2010 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2010, : 1340 - 1344
  • [46] Lower Bounds on the Minimax Risk for the Source Localization Problem
    Venkatesh, Praveen
    Grover, Pulkit
    2017 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2017,
  • [47] Lower bounds for the asymptotic minimax risk with spherical data
    Klemelä, J
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2003, 113 (01) : 113 - 136
  • [48] Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs
    HasanzadeZonuzy, Aria
    Bura, Archana
    Kalathil, Dileep
    Shakkottai, Srinivas
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7667 - 7674
  • [49] Memory Lower Bounds of Reductions Revisited
    Wang, Yuyu
    Matsuda, Takahiro
    Hanaoka, Goichiro
    Tanaka, Keisuke
    ADVANCES IN CRYPTOLOGY - EUROCRYPT 2018, PT I, 2018, 10820 : 61 - 90
  • [50] Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks
    Kalan, Seyed Mohammadreza Mousavi
    Fabian, Zalan
    Avestimehr, Salman
    Soltanolkotabi, Mahdi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33