Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

被引：0

作者：

Domingues, Omar Darwiche ^{[1
]}

Menard, Pierre ^{[2
]}

Kaufmann, Emilie ^{[1
,3
,4
]}

Valko, Michal ^{[1
,5
]}

机构：

[1] Inria Lille, Lille, France

[2] Otto von Guericke Univ, Magdeburg, Germany

[3] CNRS, Paris, France

[4] ULille, CRIStAL, Lille, France

[5] DeepMind Paris, Paris, France

来源：

ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷

关键词：

reinforcement learning; episodic; lower bounds;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a lower bound of Omega((H(3)SA/epsilon(2)) log(1/delta)) on the sample complexity of an (epsilon, delta)-PAC algorithm for best policy identification in a non-stationary MDP, relying on a construction of "hard MDPs" which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the Omega(root H(3)SAT) regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.

引用

页数：21

共 50 条

[41] Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition
Jin, Tiancheng
Luo, Haipeng
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[42] Near-optimal Reinforcement Learning in Factored MDPs
Osband, Ian
Van Roy, Benjamin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[43] Reinforcement learning for MDPs using temporal difference schemes
Thomas, A
Marcus, SI
PROCEEDINGS OF THE 36TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-5, 1997, : 577 - 583
[44] Exploiting Additive Structure in Factored MDPs for Reinforcement Learning
Degris, Thomas
Sigaud, Olivier
Wuillemin, Pierre-Henri
RECENT ADVANCES IN REINFORCEMENT LEARNING, 2008, 5323 : 15 - 26
[45] Minimax Lower Bounds via f-divergences
Guntuboyina, Adityanand
2010 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, 2010, : 1340 - 1344
[46] Lower Bounds on the Minimax Risk for the Source Localization Problem
Venkatesh, Praveen
Grover, Pulkit
2017 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2017,
[47] Lower bounds for the asymptotic minimax risk with spherical data
Klemelä, J
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2003, 113 (01) : 113 - 136
[48] Learning with Safety Constraints: Sample Complexity of Reinforcement Learning for Constrained MDPs
HasanzadeZonuzy, Aria
Bura, Archana
Kalathil, Dileep
Shakkottai, Srinivas
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7667 - 7674
[49] Memory Lower Bounds of Reductions Revisited
Wang, Yuyu
Matsuda, Takahiro
Hanaoka, Goichiro
Tanaka, Keisuke
ADVANCES IN CRYPTOLOGY - EUROCRYPT 2018, PT I, 2018, 10820 : 61 - 90
[50] Minimax Lower Bounds for Transfer Learning with Linear and One-hidden Layer Neural Networks
Kalan, Seyed Mohammadreza Mousavi
Fabian, Zalan
Avestimehr, Salman
Soltanolkotabi, Mahdi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

← 1 2 3 4 5 →