A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

被引：572

作者：

Grondman, Ivo ^{[1
]}

Busoniu, Lucian ^{[2
,3
,4
]}

Lopes, Gabriel A. D. ^{[1
]}

Babuska, Robert ^{[1
]}

机构：

[1] Delft Univ Technol, Delft Ctr Syst & Control, NL-2628 CD Delft, Netherlands

[2] Univ Lorraine, CRAN, UMR 7039, F-54500 Vandoeuvre Les Nancy, France

[3] CNRS, CRAN, UMR 7039, F-54500 Vandoeuvre Les Nancy, France

[4] Tech Univ Cluj Napoca, Dept Automat, Cluj Napoca 400020, Romania

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS | 2012年 / 42卷 / 06期

关键词：

Actor-critic; natural gradient; policy gradient; reinforcement learning (RL); ALGORITHM; COST; APPROXIMATION;

D O I：

10.1109/TSMCC.2012.2218595

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.

引用

页码：1291 / 1307

页数：17

共 50 条

[31] Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
Veeriah, Vivek
van Seijen, Harm
Sutton, Richard S.
[J]. AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 556 - 564
[32] Exploring Policy Diversity in Parallel Actor-Critic Learning
Zhang, Yanqiang
Zhai, Yuanzhao
Zhou, Gongqian
Ding, Bo
Feng, Dawei
Liu, Songwang
[J]. 2022 IEEE 34TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, ICTAI, 2022, : 1196 - 1203
[33] An Actor-Critic Hierarchical Reinforcement Learning Model for Course Recommendation
Liang, Kun
Zhang, Guoqiang
Guo, Jinhui
Li, Wentao
[J]. ELECTRONICS, 2023, 12 (24)
[34] Natural actor-critic algorithms
Bhatnagar, Shalabh
Sutton, Richard S.
Ghavamzadeh, Mohammad
Lee, Mark
[J]. AUTOMATICA, 2009, 45 (11) : 2471 - 2482
[35] THE APPLICATION OF ACTOR-CRITIC REINFORCEMENT LEARNING FOR FAB DISPATCHING SCHEDULING
Kim, Namyong
Shin, IIayong
[J]. 2017 WINTER SIMULATION CONFERENCE (WSC), 2017, : 4570 - 4571
[36] ACTOR-CRITIC DEEP REINFORCEMENT LEARNING FOR DYNAMIC MULTICHANNEL ACCESS
Zhong, Chen
Lu, Ziyang
Gursoy, M. Cenk
Velipasalar, Senem
[J]. 2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 599 - 603
[37] Provable Benefits of Actor-Critic Methods for Offline Reinforcement Learning
Zanette, Andrea
Wainwright, Martin J.
Brunskill, Emma
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[38] Swarm Reinforcement Learning Method Based on an Actor-Critic Method
Iima, Hitoshi
Kuroe, Yasuaki
[J]. SIMULATED EVOLUTION AND LEARNING, 2010, 6457 : 279 - 288
[39] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
Li, Qiang
Nie, Jun
Wang, Haixia
Lu, Xiao
Song, Shibin
[J]. 2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
[40] Hybrid Actor-Critic Reinforcement Learning in Parameterized Action Space
Fan, Zhou
Su, Rui
Zhang, Weinan
Yu, Yong
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2279 - 2285

← 1 2 3 4 5 →