A basic formula for Online policy gradient algorithms

被引:20
|
作者
Cao, XR [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Kowloon, Hong Kong, Peoples R China
关键词
Markov decision processes; online estimation; perturbation analysis (PA); perturbation realization; Poisson equations; potentials; reinforcement learning;
D O I
10.1109/TAC.2005.847037
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This note presents a (new) basic formula for sample-path-based estimates for performance gradients for Markov systems (called policy gradients in reinforcement learning literature). With this basic formula, many policy-gradient algorithms, including those that have previously appeared in the literature, can be easily developed. The formula follows naturally from a sensitivity equation in perturbation analysis. New research direction is discussed.
引用
收藏
页码:696 / 699
页数:4
相关论文
共 50 条
  • [1] Deterministic Policy Gradient Algorithms
    Silver, David
    Lever, Guy
    Heess, Nicolas
    Degris, Thomas
    Wierstra, Daan
    Riedmiller, Martin
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [2] Online gradient descent learning algorithms
    Ying, Yiming
    Pontil, Massimiliano
    [J]. FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2008, 8 (05) : 561 - 596
  • [3] Online Gradient Descent Learning Algorithms
    Yiming Ying
    Massimiliano Pontil
    [J]. Foundations of Computational Mathematics, 2008, 8 : 561 - 596
  • [4] An improvement of policy gradient estimation algorithms
    Li, Yanjie
    Cao, Fang
    Cao, Xi-Ren
    [J]. WODES' 08: PROCEEDINGS OF THE 9TH INTERNATIONAL WORKSHOP ON DISCRETE EVENT SYSTEMS, 2008, : 168 - 172
  • [5] APPROXIMATE NEWTON POLICY GRADIENT ALGORITHMS
    Li, Haoya
    Gupta, Samarth
    Yu, Hsiangfu
    Ying, Lexing
    Dhillon, Inderjit
    [J]. SIAM Journal on Scientific Computing, 2023, 45 (05):
  • [6] Successful Ingredients of Policy Gradient Algorithms
    Gronauer, Sven
    Gottwald, Martin
    Diepold, Klaus
    [J]. PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2455 - 2461
  • [7] Online Learning With Inexact Proximal Online Gradient Descent Algorithms
    Dixit, Rishabh
    Bedi, Unlit Singh
    Tripathi, Ruchi
    Rajawat, Ketan
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (05) : 1338 - 1352
  • [8] ONLINE REGULARIZED GENERALIZED GRADIENT CLASSIFICATION ALGORITHMS
    Leilei Zhang (Ningbo University
    [J]. Analysis in Theory and Applications, 2010, 26 (03) : 278 - 300
  • [9] Online gradient descent algorithms for functional data learning
    Chen, Xiaming
    Tang, Bohao
    Fan, Jun
    Guo, Xin
    [J]. JOURNAL OF COMPLEXITY, 2022, 70
  • [10] Bayesian Policy Gradient and Actor-Critic Algorithms
    Ghavamzadeh, Mohammad
    Engel, Yaakov
    Valko, Michal
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17