A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms

被引：0

作者：

Amortila, Philip ^{[1
]}

Precup, Doina ^{[1
,2
]}

Panangaden, Prakash ^{[1
]}

Bellemare, Marc G. ^{[3
]}

机构：

[1] McGill Univ, Montreal, PQ, Canada

[2] Google DeepMind, London, England

[3] Google Res, Brain Team, Mountain View, CA USA

来源：

INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108 | 2020年 / 108卷

关键词：

STOCHASTIC-APPROXIMATION;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We demonstrate its effectiveness by presenting simple and unified proofs of convergence for a variety of commonly-used methods. We show that value-based methods such as TD(lambda) and Q-Learning have update rules which are contractive in the space of distributions of functions, thus establishing their exponentially fast convergence to a stationary distribution. We demonstrate that the stationary distribution obtained by any algorithm whose target is an expected Bellman update has a mean which is equal to the true value function. Furthermore, we establish that the distributions concentrate around their mean as the step-size shrinks. We further analyse the optimistic policy iteration algorithm, for which the contraction property does not hold, and formulate a probabilistic policy improvement property which entails the convergence of the algorithm.

引用

页码：4357 / 4365

页数：9

共 50 条

[1] Sampling-based Inverse Reinforcement Learning Algorithms with Safety Constraints
Fischer, Johannes
Eyberg, Christoph
Werling, Moritz
Lauer, Martin
[J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 791 - 798
[2] Optimistic Thompson Sampling-based Algorithms for Episodic Reinforcement Learning
Hu, Bingshan
Zhang, Tianyue H.
Hegde, Nidhi
Schmidt, Mark
[J]. UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 890 - 899
[3] Estimation and control using sampling-based Bayesian reinforcement learning
Slade, Patrick
Sunberg, Zachary N.
Kochenderfer, Mykel J.
[J]. IET CYBER-PHYSICAL SYSTEMS: THEORY & APPLICATIONS, 2020, 5 (01) : 127 - 135
[4] ESRL: Efficient Sampling-Based Reinforcement Learning for Sequence Generation
Wang, Chenglong
Zhou, Hang
Hu, Yimin
Huo, Yifu
Li, Bei
Liu, Tongran
Xiao, Tong
Zhu, Jingbo
[J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19107 - 19115
[5] Posterior Sampling-Based Reinforcement Learning for Control of Unknown Linear Systems
Ouyang, Yi
Gagrani, Mukul
Jain, Rahul
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (08) : 3600 - 3607
[6] Machine Learning Guided Exploration for Sampling-based Motion Planning Algorithms
Arslan, Oktay
Tsiotras, Panagiotis
[J]. 2015 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2015, : 2646 - 2652
[7] Sampling-Based Approximation Algorithms for Reachability Analysis with Provable Guarantees
Liebenwein, Lucas
Baykal, Cenk
Gilitschenski, Igor
Karaman, Sertac
Rus, Daniela
[J]. ROBOTICS: SCIENCE AND SYSTEMS XIV, 2018,
[8] Sampling-based algorithms for optimal motion planning
Karaman, Sertac
Frazzoli, Emilio
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2011, 30 (07): : 846 - 894
[9] Sampling-based robotic information gathering algorithms
Hollinger, Geoffrey A.
Sukhatme, Gaurav S.
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2014, 33 (09): : 1271 - 1287
[10] A Framework for Description and Analysis of Sampling-based Approximate Triangle Counting Algorithms
Chehreghani, Mostafa Haghir
[J]. PROCEEDINGS OF 3RD IEEE/ACM INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS, (DSAA 2016), 2016, : 80 - 89

← 1 2 3 4 5 →