A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

被引：572

作者：

Grondman, Ivo ^{[1
]}

Busoniu, Lucian ^{[2
,3
,4
]}

Lopes, Gabriel A. D. ^{[1
]}

Babuska, Robert ^{[1
]}

机构：

[1] Delft Univ Technol, Delft Ctr Syst & Control, NL-2628 CD Delft, Netherlands

[2] Univ Lorraine, CRAN, UMR 7039, F-54500 Vandoeuvre Les Nancy, France

[3] CNRS, CRAN, UMR 7039, F-54500 Vandoeuvre Les Nancy, France

[4] Tech Univ Cluj Napoca, Dept Automat, Cluj Napoca 400020, Romania

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS | 2012年 / 42卷 / 06期

关键词：

Actor-critic; natural gradient; policy gradient; reinforcement learning (RL); ALGORITHM; COST; APPROXIMATION;

D O I：

10.1109/TSMCC.2012.2218595

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Policy-gradient-based actor-critic algorithms are amongst the most popular algorithms in the reinforcement learning framework. Their advantage of being able to search for optimal policies using low-variance gradient estimates has made them useful in several real-life applications, such as robotics, power control, and finance. Although general surveys on reinforcement learning techniques already exist, no survey is specifically dedicated to actor-critic algorithms in particular. This paper, therefore, describes the state of the art of actor-critic algorithms, with a focus on methods that can work in an online setting and use function approximation in order to deal with continuous state and action spaces. After starting with a discussion on the concepts of reinforcement learning and the origins of actor-critic algorithms, this paper describes the workings of the natural gradient, which has made its way into many actor-critic algorithms over the past few years. A review of several standard and natural actor-critic algorithms is given, and the paper concludes with an overview of application areas and a discussion on open issues.

引用

页码：1291 / 1307

页数：17

共 50 条

[1] Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
Zhou, Ruida
Liu, Tao
Cheng, Min
Kalathil, Dileep
Kumar, P. R.
Tian, Chao
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[2] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
Zhang, Yan
Zavlanos, Michael M.
[J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
[3] Optimal Policy of Multiplayer Poker via Actor-Critic Reinforcement Learning
Shi, Daming
Guo, Xudong
Liu, Yi
Fan, Wenhui
[J]. ENTROPY, 2022, 24 (06)
[4] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
Prashant Trivedi
Nandyala Hemachandra
[J]. Dynamic Games and Applications, 2023, 13 : 25 - 55
[5] A World Model for Actor-Critic in Reinforcement Learning
Panov, A. I.
Ugadiarov, L. A.
[J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
[6] Actor-Critic based Improper Reinforcement Learning
Zaki, Mohammadi
Mohan, Avinash
Gopalan, Aditya
Mannor, Shie
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[7] Curious Hierarchical Actor-Critic Reinforcement Learning
Roeder, Frank
Eppe, Manfred
Nguyen, Phuong D. H.
Wermter, Stefan
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419
[8] Integrated Actor-Critic for Deep Reinforcement Learning
Zheng, Jiaohao
Kurt, Mehmet Necip
Wang, Xiaodong
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
[9] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
Trivedi, Prashant
Hemachandra, Nandyala
[J]. DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 25 - 55
[10] A modified actor-critic reinforcement learning algorithm
Mustapha, SM
Lachiver, G
[J]. 2000 CANADIAN CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING, CONFERENCE PROCEEDINGS, VOLS 1 AND 2: NAVIGATING TO A NEW ERA, 2000, : 605 - 609

← 1 2 3 4 5 →