Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

被引:0
|
作者
Domingues, Omar Darwiche [1 ]
Menard, Pierre [2 ]
Kaufmann, Emilie [1 ,3 ,4 ]
Valko, Michal [1 ,5 ]
机构
[1] Inria Lille, Lille, France
[2] Otto von Guericke Univ, Magdeburg, Germany
[3] CNRS, Paris, France
[4] ULille, CRIStAL, Lille, France
[5] DeepMind Paris, Paris, France
来源
关键词
reinforcement learning; episodic; lower bounds;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a lower bound of Omega((H(3)SA/epsilon(2)) log(1/delta)) on the sample complexity of an (epsilon, delta)-PAC algorithm for best policy identification in a non-stationary MDP, relying on a construction of "hard MDPs" which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the Omega(root H(3)SAT) regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
    He, Jiafan
    Zhou, Dongruo
    Gu, Quanquan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [2] Minimax Regret Bounds for Reinforcement Learning
    Azar, Mohammad Gheshlaghi
    Osband, Ian
    Munos, Remi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [3] Reinforcement learning in finite MDPs: PAC analysis
    Strehl, Alexander L.
    Li, Hong
    Littman, Michael L.
    Journal of Machine Learning Research, 2009, 10 : 2413 - 2444
  • [4] Reinforcement Learning in Finite MDPs: PAC Analysis
    Strehl, Alexander L.
    Li, Lihong
    Littman, Michael L.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 2413 - 2444
  • [5] Strong Minimax Lower Bounds for Learning
    András Antos
    Gábor Lugosi
    Machine Learning, 1998, 30 : 31 - 56
  • [6] Strong minimax lower bounds for learning
    Antos, A
    Lugosi, G
    MACHINE LEARNING, 1998, 30 (01) : 31 - 56
  • [7] Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting
    Xu, Ziping
    Tewari, Ambuj
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [8] Minimax weight learning for absorbing MDPs
    Li, Fengying
    Li, Yuqiang
    Wu, Xianyi
    STATISTICAL PAPERS, 2024, 65 (06) : 3545 - 3582
  • [9] Minimax Lower Bounds on Dictionary Learning for Tensor Data
    Shakeri, Zahra
    Bajwa, Waheed U.
    Sarwate, Anand D.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (04) : 2706 - 2726
  • [10] Reinforcement learning for MDPs with constraints
    Geibel, Peter
    MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 646 - 653