A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

被引:0
|
作者
Amortila, Philip [1 ]
Precup, Doina [1 ,2 ]
Panangaden, Prakash [1 ]
Bellemare, Marc G. [3 ]
机构
[1] McGill Univ, Montreal, PQ, Canada
[2] Google DeepMind, London, England
[3] Google Res, Brain Team, Mountain View, CA USA
关键词
STOCHASTIC-APPROXIMATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(lambda) and Q-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.
引用
收藏
页码:4357 / 4365
页数:9
相关论文
共 50 条
  • [1] Sampling-based Inverse Reinforcement Learning Algorithms with Safety Constraints
    Fischer, Johannes
    Eyberg, Christoph
    Werling, Moritz
    Lauer, Martin
    [J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 791 - 798
  • [2] Optimistic Thompson Sampling-based Algorithms for Episodic Reinforcement Learning
    Hu, Bingshan
    Zhang, Tianyue H.
    Hegde, Nidhi
    Schmidt, Mark
    [J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 890 - 899
  • [3] Estimation and control using sampling-based Bayesian reinforcement learning
    Slade, Patrick
    Sunberg, Zachary N.
    Kochenderfer, Mykel J.
    [J]. IET CYBER-PHYSICAL SYSTEMS: THEORY & APPLICATIONS, 2020, 5 (01) : 127 - 135
  • [4] ESRL: Efficient Sampling-Based Reinforcement Learning for Sequence Generation
    Wang, Chenglong
    Zhou, Hang
    Hu, Yimin
    Huo, Yifu
    Li, Bei
    Liu, Tongran
    Xiao, Tong
    Zhu, Jingbo
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19107 - 19115
  • [5] Posterior Sampling-Based Reinforcement Learning for Control of Unknown Linear Systems
    Ouyang, Yi
    Gagrani, Mukul
    Jain, Rahul
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (08) : 3600 - 3607
  • [6] Machine Learning Guided Exploration for Sampling-based Motion Planning Algorithms
    Arslan, Oktay
    Tsiotras, Panagiotis
    [J]. 2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 2646 - 2652
  • [7] Sampling-Based Approximation Algorithms for Reachability Analysis with Provable Guarantees
    Liebenwein, Lucas
    Baykal, Cenk
    Gilitschenski, Igor
    Karaman, Sertac
    Rus, Daniela
    [J]. ROBOTICS: SCIENCE AND SYSTEMS XIV, 2018,
  • [8] Sampling-based algorithms for optimal motion planning
    Karaman, Sertac
    Frazzoli, Emilio
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2011, 30 (07): : 846 - 894
  • [9] Sampling-based robotic information gathering algorithms
    Hollinger, Geoffrey A.
    Sukhatme, Gaurav S.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2014, 33 (09): : 1271 - 1287
  • [10] A Framework for Description and Analysis of Sampling-based Approximate Triangle Counting Algorithms
    Chehreghani, Mostafa Haghir
    [J]. PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016), 2016, : 80 - 89