Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

被引:0
|
作者
Domingues, Omar Darwiche [1 ]
Menard, Pierre [2 ]
Kaufmann, Emilie [1 ,3 ,4 ]
Valko, Michal [1 ,5 ]
机构
[1] Inria Lille, Lille, France
[2] Otto von Guericke Univ, Magdeburg, Germany
[3] CNRS, Paris, France
[4] ULille, CRIStAL, Lille, France
[5] DeepMind Paris, Paris, France
来源
关键词
reinforcement learning; episodic; lower bounds;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a lower bound of Omega((H(3)SA/epsilon(2)) log(1/delta)) on the sample complexity of an (epsilon, delta)-PAC algorithm for best policy identification in a non-stationary MDP, relying on a construction of "hard MDPs" which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the Omega(root H(3)SAT) regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.
引用
收藏
页数:21
相关论文
共 50 条
  • [31] TeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs
    Kozlova, Olga
    Sigaud, Olivier
    Meyer, Christophe
    FROM ANIMALS TO ANIMATS 11, 2010, 6226 : 489 - +
  • [32] Safety-Constrained Reinforcement Learning for MDPs
    Junges, Sebastian
    Jansen, Nils
    Dehnert, Christian
    Topcu, Ufuk
    Katoen, Joost-Pieter
    TOOLS AND ALGORITHMS FOR THE CONSTRUCTION AND ANALYSIS OF SYSTEMS (TACAS 2016), 2016, 9636 : 130 - 146
  • [33] Lower Bounds for Policy Iteration on Multi-action MDPs
    Ashutosh, Kumar
    Consul, Sarthak
    Dedhia, Bhishma
    Khirwadkar, Parthasarathi
    Shah, Sahil
    Kalyanakrishnan, Shivaram
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 1744 - 1749
  • [34] Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    Sutton, RS
    Precup, D
    Singh, S
    ARTIFICIAL INTELLIGENCE, 1999, 112 (1-2) : 181 - 211
  • [35] Minimax Lower Bounds for Linear Independence Testing
    Ramdas, Aaditya
    Isenberg, David
    Singh, Aarti
    Wasserman, Larry
    2016 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2016, : 965 - 969
  • [36] Lower bounds on the minimax risk of sequential estimators
    Mizera, B
    STATISTICS, 1996, 28 (02) : 123 - 129
  • [37] Minimax Lower Bounds for H∞-Norm Estimation
    Tu, Stephen
    Boczar, Ross
    Recht, Benjamin
    2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 3538 - 3543
  • [38] Minimax Lower Bounds for Circular Source Localization
    Xu, Aolin
    Coleman, Todd
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 1242 - 1247
  • [39] Minimax lower bounds for function estimation on graphs
    Kirichenko, Alisa
    van Zanten, Harry
    ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (01): : 651 - 666
  • [40] MINIMAX LOWER BOUNDS FOR NONNEGATIVE MATRIX FACTORIZATION
    Alsan, Mine
    Liu, Zhaoqiang
    Tan, Vincent Y. F.
    2018 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2018, : 363 - 367