Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

被引：0

作者：

Domingues, Omar Darwiche ^{[1
]}

Menard, Pierre ^{[2
]}

Kaufmann, Emilie ^{[1
,3
,4
]}

Valko, Michal ^{[1
,5
]}

机构：

[1] Inria Lille, Lille, France

[2] Otto von Guericke Univ, Magdeburg, Germany

[3] CNRS, Paris, France

[4] ULille, CRIStAL, Lille, France

[5] DeepMind Paris, Paris, France

来源：

ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷

关键词：

reinforcement learning; episodic; lower bounds;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we propose new problem-independent lower bounds on the sample complexity and regret in episodic MDPs, with a particular focus on the non-stationary case in which the transition kernel is allowed to change in each stage of the episode. Our main contribution is a lower bound of Omega((H(3)SA/epsilon(2)) log(1/delta)) on the sample complexity of an (epsilon, delta)-PAC algorithm for best policy identification in a non-stationary MDP, relying on a construction of "hard MDPs" which is different from the ones previously used in the literature. Using this same class of MDPs, we also provide a rigorous proof of the Omega(root H(3)SAT) regret bound for non-stationary MDPs. Finally, we discuss connections to PAC-MDP lower bounds.

引用

页数：21

共 50 条

[1] Nearly Minimax Optimal Reinforcement Learning for Discounted MDPs
He, Jiafan
Zhou, Dongruo
Gu, Quanquan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[2] Minimax Regret Bounds for Reinforcement Learning
Azar, Mohammad Gheshlaghi
Osband, Ian
Munos, Remi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[3] Reinforcement learning in finite MDPs: PAC analysis
Strehl, Alexander L.
Li, Hong
Littman, Michael L.
Journal of Machine Learning Research, 2009, 10 : 2413 - 2444
[4] Reinforcement Learning in Finite MDPs: PAC Analysis
Strehl, Alexander L.
Li, Lihong
Littman, Michael L.
JOURNAL OF MACHINE LEARNING RESEARCH, 2009, 10 : 2413 - 2444
[5] Strong Minimax Lower Bounds for Learning
András Antos
Gábor Lugosi
Machine Learning, 1998, 30 : 31 - 56
[6] Strong minimax lower bounds for learning
Antos, A
Lugosi, G
MACHINE LEARNING, 1998, 30 (01) : 31 - 56
[7] Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting
Xu, Ziping
Tewari, Ambuj
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[8] Minimax weight learning for absorbing MDPs
Li, Fengying
Li, Yuqiang
Wu, Xianyi
STATISTICAL PAPERS, 2024, 65 (06) : 3545 - 3582
[9] Minimax Lower Bounds on Dictionary Learning for Tensor Data
Shakeri, Zahra
Bajwa, Waheed U.
Sarwate, Anand D.
IEEE TRANSACTIONS ON INFORMATION THEORY, 2018, 64 (04) : 2706 - 2726
[10] Reinforcement learning for MDPs with constraints
Geibel, Peter
MACHINE LEARNING: ECML 2006, PROCEEDINGS, 2006, 4212 : 646 - 653

← 1 2 3 4 5 →