A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

被引:572
|
作者
Grondman, Ivo [1 ]
Busoniu, Lucian [2 ,3 ,4 ]
Lopes, Gabriel A. D. [1 ]
Babuska, Robert [1 ]
机构
[1] Delft Univ Technol, Delft Ctr Syst & Control, NL-2628 CD Delft, Netherlands
[2] Univ Lorraine, CRAN, UMR 7039, F-54500 Vandoeuvre Les Nancy, France
[3] CNRS, CRAN, UMR 7039, F-54500 Vandoeuvre Les Nancy, France
[4] Tech Univ Cluj Napoca, Dept Automat, Cluj Napoca 400020, Romania
关键词
Actor-critic; natural gradient; policy gradient; reinforcement learning (RL); ALGORITHM; COST; APPROXIMATION;
D O I
10.1109/TSMCC.2012.2218595
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.
引用
收藏
页码:1291 / 1307
页数:17
相关论文
共 50 条
  • [1] Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
    Zhou, Ruida
    Liu, Tao
    Cheng, Min
    Kalathil, Dileep
    Kumar, P. R.
    Tian, Chao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [2] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
    Zhang, Yan
    Zavlanos, Michael M.
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
  • [3] Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning
    Shi, Daming
    Guo, Xudong
    Liu, Yi
    Fan, Wenhui
    [J]. ENTROPY, 2022, 24 (06)
  • [4] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Prashant Trivedi
    Nandyala Hemachandra
    [J]. Dynamic Games and Applications, 2023, 13 : 25 - 55
  • [5] A World Model for Actor-Critic in Reinforcement Learning
    Panov, A. I.
    Ugadiarov, L. A.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
  • [6] Actor-Critic based Improper Reinforcement Learning
    Zaki, Mohammadi
    Mohan, Avinash
    Gopalan, Aditya
    Mannor, Shie
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [7] Curious Hierarchical Actor-Critic Reinforcement Learning
    Roeder, Frank
    Eppe, Manfred
    Nguyen, Phuong D. H.
    Wermter, Stefan
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419
  • [8] Integrated Actor-Critic for Deep Reinforcement Learning
    Zheng, Jiaohao
    Kurt, Mehmet Necip
    Wang, Xiaodong
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
  • [9] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Trivedi, Prashant
    Hemachandra, Nandyala
    [J]. DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 25 - 55
  • [10] A modified actor-critic reinforcement learning algorithm
    Mustapha, SM
    Lachiver, G
    [J]. 2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 605 - 609