Near-optimal PAC bounds for discounted MDPs

被引:28
|
作者
Lattimore, Tor [1 ]
Hutter, Marcus [2 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB T6G 2M7, Canada
[2] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT 0200, Australia
关键词
Sample-complexity; PAC bounds; Markov decision processes; Reinforcement learning; SAMPLE COMPLEXITY;
D O I
10.1016/j.tcs.2014.09.029
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPS). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (UCRL) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends linearly on the number of non-zero transition probabilities. The lower bound strengthens previous work by being both more general (it applies to all policies) and tighter. The upper and lower bounds match up to logarithmic factors provided the transition matrix is not too dense. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:125 / 143
页数:19
相关论文
共 50 条
  • [1] Near-Optimal Sample Complexity Bounds for Constrained MDPs
    Vaswani, Sharan
    Yang, Lin F.
    Szepesvari, Csaba
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [2] Near-Optimal Interdiction of Factored MDPs
    Panda, Swetasudha
    Vorobeychik, Yevgeniy
    [J]. CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI2017), 2017,
  • [3] Near-optimal Reinforcement Learning in Factored MDPs
    Osband, Ian
    Van Roy, Benjamin
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [4] NEAR-OPTIMAL BOUNDS FOR PHASE SYNCHRONIZATION
    Zhong, Yiqiao
    Boumal, Nicolas
    [J]. SIAM JOURNAL ON OPTIMIZATION, 2018, 28 (02) : 989 - 1016
  • [5] Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs
    Tirinzoni, Andrea
    Al-Marjani, Aymen
    Kaufmann, Emilie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [6] Near-optimal Regret Bounds for Reinforcement Learning
    Jaksch, Thomas
    Ortner, Ronald
    Auer, Peter
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2010, 11 : 1563 - 1600
  • [7] BOUNDS FOR THE ADDITIONAL COST OF NEAR-OPTIMAL CONTROLS
    STEINBERG, AM
    FORTE, I
    [J]. JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1980, 31 (03) : 385 - 395
  • [8] Near-optimal quantum tomography: estimators and bounds
    Kueng, Richard
    Ferrie, Christopher
    [J]. NEW JOURNAL OF PHYSICS, 2015, 17
  • [9] Near-optimal regret bounds for reinforcement learning
    Jaksch, Thomas
    Ortner, Ronald
    Auer, Peter
    [J]. Journal of Machine Learning Research, 2010, 11 : 1563 - 1600
  • [10] Near-Optimal Regret Bounds for Thompson Sampling
    Agrawal, Shipra
    Goyal, Navin
    [J]. JOURNAL OF THE ACM, 2017, 64 (05)