A basic formula for Online policy gradient algorithms

被引：20

作者：

Cao, XR ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2005年 / 50卷 / 05期

关键词：

Markov decision processes; online estimation; perturbation analysis (PA); perturbation realization; Poisson equations; potentials; reinforcement learning;

D O I：

10.1109/TAC.2005.847037

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This note presents a (new) basic formula for sample-path-based estimates for performance gradients for Markov systems (called policy gradients in reinforcement learning literature). With this basic formula, many policy-gradient algorithms, including those that have previously appeared in the literature, can be easily developed. The formula follows naturally from a sensitivity equation in perturbation analysis. New research direction is discussed.

引用

页码：696 / 699

页数：4

共 50 条

[1] Deterministic Policy Gradient Algorithms
Silver, David
Lever, Guy
Heess, Nicolas
Degris, Thomas
Wierstra, Daan
Riedmiller, Martin
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
[2] Online gradient descent learning algorithms
Ying, Yiming
Pontil, Massimiliano
[J]. FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2008, 8 (05) : 561 - 596
[3] Online Gradient Descent Learning Algorithms
Yiming Ying
Massimiliano Pontil
[J]. Foundations of Computational Mathematics, 2008, 8 : 561 - 596
[4] An improvement of policy gradient estimation algorithms
Li, Yanjie
Cao, Fang
Cao, Xi-Ren
[J]. WODES' 08: PROCEEDINGS OF THE 9TH INTERNATIONAL WORKSHOP ON DISCRETE EVENT SYSTEMS, 2008, : 168 - 172
[5] APPROXIMATE NEWTON POLICY GRADIENT ALGORITHMS
Li, Haoya
Gupta, Samarth
Yu, Hsiangfu
Ying, Lexing
Dhillon, Inderjit
[J]. SIAM Journal on Scientific Computing, 2023, 45 (05):
[6] Successful Ingredients of Policy Gradient Algorithms
Gronauer, Sven
Gottwald, Martin
Diepold, Klaus
[J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2455 - 2461
[7] Online Learning With Inexact Proximal Online Gradient Descent Algorithms
Dixit, Rishabh
Bedi, Unlit Singh
Tripathi, Ruchi
Rajawat, Ketan
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (05) : 1338 - 1352
[8] ONLINE REGULARIZED GENERALIZED GRADIENT CLASSIFICATION ALGORITHMS
Leilei Zhang (Ningbo University
[J]. Analysis in Theory and Applications, 2010, 26 (03) : 278 - 300
[9] Online gradient descent algorithms for functional data learning
Chen, Xiaming
Tang, Bohao
Fan, Jun
Guo, Xin
[J]. JOURNAL OF COMPLEXITY, 2022, 70
[10] Bayesian Policy Gradient and Actor-Critic Algorithms
Ghavamzadeh, Mohammad
Engel, Yaakov
Valko, Michal
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17

← 1 2 3 4 5 →