A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

被引:572
|
作者
Grondman, Ivo [1 ]
Busoniu, Lucian [2 ,3 ,4 ]
Lopes, Gabriel A. D. [1 ]
Babuska, Robert [1 ]
机构
[1] Delft Univ Technol, Delft Ctr Syst & Control, NL-2628 CD Delft, Netherlands
[2] Univ Lorraine, CRAN, UMR 7039, F-54500 Vandoeuvre Les Nancy, France
[3] CNRS, CRAN, UMR 7039, F-54500 Vandoeuvre Les Nancy, France
[4] Tech Univ Cluj Napoca, Dept Automat, Cluj Napoca 400020, Romania
关键词
Actor-critic; natural gradient; policy gradient; reinforcement learning (RL); ALGORITHM; COST; APPROXIMATION;
D O I
10.1109/TSMCC.2012.2218595
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.
引用
收藏
页码:1291 / 1307
页数:17
相关论文
共 50 条
  • [31] Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
    Veeriah, Vivek
    van Seijen, Harm
    Sutton, Richard S.
    [J]. AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 556 - 564
  • [32] Exploring Policy Diversity in Parallel Actor-Critic Learning
    Zhang, Yanqiang
    Zhai, Yuanzhao
    Zhou, Gongqian
    Ding, Bo
    Feng, Dawei
    Liu, Songwang
    [J]. 2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1196 - 1203
  • [33] An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation
    Liang, Kun
    Zhang, Guoqiang
    Guo, Jinhui
    Li, Wentao
    [J]. ELECTRONICS, 2023, 12 (24)
  • [34] Natural actor-critic algorithms
    Bhatnagar, Shalabh
    Sutton, Richard S.
    Ghavamzadeh, Mohammad
    Lee, Mark
    [J]. AUTOMATICA, 2009, 45 (11) : 2471 - 2482
  • [35] THE APPLICATION OF ACTOR-CRITIC REINFORCEMENT LEARNING FOR FAB DISPATCHING SCHEDULING
    Kim, Namyong
    Shin, IIayong
    [J]. 2017 WINTER SIMULATION CONFERENCE (WSC), 2017, : 4570 - 4571
  • [36] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
    Zhong, Chen
    Lu, Ziyang
    Gursoy, M. Cenk
    Velipasalar, Senem
    [J]. 2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
  • [37] Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
    Zanette, Andrea
    Wainwright, Martin J.
    Brunskill, Emma
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [38] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
    Iima, Hitoshi
    Kuroe, Yasuaki
    [J]. SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288
  • [39] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
    Li, Qiang
    Nie, Jun
    Wang, Haixia
    Lu, Xiao
    Song, Shibin
    [J]. 2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
  • [40] Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
    Fan, Zhou
    Su, Rui
    Zhang, Weinan
    Yu, Yong
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2279 - 2285