Near-optimal PAC bounds for discounted MDPs

被引：28

作者：

Lattimore, Tor ^{[1
]}

Hutter, Marcus ^{[2
]}

机构：

[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2M7, Canada

[2] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT 0200, Australia

来源：

THEORETICAL COMPUTER SCIENCE | 2014年 / 558卷

关键词：

Sample-complexity; PAC bounds; Markov decision processes; Reinforcement learning; SAMPLE COMPLEXITY;

D O I：

10.1016/j.tcs.2014.09.029

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPS). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (UCRL) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends linearly on the number of non-zero transition probabilities. The lower bound strengthens previous work by being both more general (it applies to all policies) and tighter. The upper and lower bounds match up to logarithmic factors provided the transition matrix is not too dense. (C) 2014 Elsevier B.V. All rights reserved.

引用

页码：125 / 143

页数：19

共 50 条

[1] Near-Optimal Sample Complexity Bounds for Constrained MDPs
Vaswani, Sharan
Yang, Lin F.
Szepesvari, Csaba
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[2] Near-Optimal Interdiction of Factored MDPs
Panda, Swetasudha
Vorobeychik, Yevgeniy
[J]. CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
[3] Near-optimal Reinforcement Learning in Factored MDPs
Osband, Ian
Van Roy, Benjamin
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
[4] NEAR-OPTIMAL BOUNDS FOR PHASE SYNCHRONIZATION
Zhong, Yiqiao
Boumal, Nicolas
[J]. SIAM JOURNAL ON OPTIMIZATION, 2018, 28 (02) : 989 - 1016
[5] Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs
Tirinzoni, Andrea
Al-Marjani, Aymen
Kaufmann, Emilie
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[6] Near-optimal Regret Bounds for Reinforcement Learning
Jaksch, Thomas
Ortner, Ronald
Auer, Peter
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 1563 - 1600
[7] BOUNDS FOR THE ADDITIONAL COST OF NEAR-OPTIMAL CONTROLS
STEINBERG, AM
FORTE, I
[J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1980, 31 (03) : 385 - 395
[8] Near-optimal quantum tomography: estimators and bounds
Kueng, Richard
Ferrie, Christopher
[J]. NEW JOURNAL OF PHYSICS, 2015, 17
[9] Near-optimal regret bounds for reinforcement learning
Jaksch, Thomas
Ortner, Ronald
Auer, Peter
[J]. Journal of Machine Learning Research, 2010, 11 : 1563 - 1600
[10] Near-Optimal Regret Bounds for Thompson Sampling
Agrawal, Shipra
Goyal, Navin
[J]. JOURNAL OF THE ACM, 2017, 64 (05)

← 1 2 3 4 5 →