Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

被引:0
|
作者
Domingues, Omar Darwiche [1 ]
Menard, Pierre [2 ]
Kaufmann, Emilie [1 ,3 ,4 ]
Valko, Michal [1 ,5 ]
机构
[1] Inria Lille, Lille, France
[2] Otto von Guericke Univ, Magdeburg, Germany
[3] CNRS, Paris, France
[4] ULille, CRIStAL, Lille, France
[5] DeepMind Paris, Paris, France
来源
关键词
reinforcement learning; episodic; lower bounds;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a lower bound of Omega((H(3)SA/epsilon(2)) log(1/delta)) on the sample complexity of an (epsilon, delta)-PAC algorithm for best policy identification in a non-stationary MDP, relying on a construction of "hard MDPs" which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the Omega(root H(3)SAT) regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.
引用
收藏
页数:21
相关论文
共 50 条
  • [21] Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs
    Mao, Weichao
    Zhang, Kaiqing
    Zhu, Ruihao
    Simchi-Levi, David
    Basar, Tamer
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [22] Obtaining minimax lower bounds: a review
    Arlene K. H. Kim
    Journal of the Korean Statistical Society, 2020, 49 : 673 - 701
  • [23] Minimax lower bounds and moduli of continuity
    Jongbloed, G
    STATISTICS & PROBABILITY LETTERS, 2000, 50 (03) : 279 - 284
  • [24] Obtaining minimax lower bounds: a review
    Kim, Arlene K. H.
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2020, 49 (03) : 673 - 701
  • [25] Minimax bounds for active learning
    Castro, Rui M.
    Nowak, Robert D.
    LEARNING THEORY, PROCEEDINGS, 2007, 4539 : 5 - +
  • [26] Minimax bounds for active learning
    Castro, Rui M.
    Nowak, Robert D.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2008, 54 (05) : 2339 - 2353
  • [27] Knowledge Revision for Reinforcement Learning with Abstract MDPs
    Efthymiadis, Kyriakos
    Kudenko, Daniel
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 763 - 770
  • [28] Reinforcement Learning in Parametric MDPs with Exponential Families
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    Maillard, Odalric-Ambrym
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [29] Knowledge Revision for Reinforcement Learning with Abstract MDPs
    Efthymiadis, Kyriakos
    Devlin, Sam
    Kudenko, Daniel
    AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1535 - 1536
  • [30] Reinforcement Learning in Reward-Mixing MDPs
    Kwon, Jeongyeol
    Efroni, Yonathan
    Caramanis, Constantine
    Mannor, Shie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34