Error controlled actor-critic

被引：2

作者：

Gao, Xingen ^{[1
]}

Chao, Fei ^{[2
,5
]}

Zhou, Changle ^{[2
]}

Ge, Zhen ^{[3
]}

Yang, Longzhi ^{[4
]}

Chang, Xiang ^{[5
]}

Shang, Changjing ^{[5
]}

Shen, Qiang ^{[5
]}

机构：

[1] Xiamen Univ Technol, Sch Optoelect & Commun Engn, Xiamen 361024, Peoples R China

[2] Xiamen Univ, Sch Informat, Dept Artificial Intelligence, Xiamen 361005, Peoples R China

[3] Univ Technol Sydney, Fac Engn & Informat Technol, Ultimo, NSW, Australia

[4] Northumbria Univ, Comp Sci & Digital Technol Dept, Newcastle upon Tyne, England

[5] Aberystwyth Univ, Inst Math Phys & Comp Sci, Dept Comp Sci, Aberystwyth SY23 3DB, Wales

来源：

INFORMATION SCIENCES | 2022年 / 612卷

关键词：

Reinforcement learning; Actor-critic; Approximation error; Overestimation; KL-divergence;

D O I：

10.1016/j.ins.2022.08.079

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The approximation inaccuracy of the value function in reinforcement learning (RL) algo-rithms unavoidably leads to an overestimation phenomenon, which has negative effects on the convergence of the algorithms. To limit the negative effects of the approximation error, we propose error controlled actor-critic (ECAC) which ensures the approximation error is limited within the value function. We present an investigation of how approxima-tion inaccuracy can impair the optimization process of actor-critic approaches. In addition, we derive an upper bound for the approximation error of the Q function approximator and discover that the error can be reduced by limiting the KL-divergence between every two consecutive policies during policy training. Experiments on a variety of continuous control tasks demonstrate that the proposed actor-critic approach decreases approximation error and outperforms previous model-free RL algorithms by a significant margin.(c) 2022 Elsevier Inc. All rights reserved.

引用

页码：62 / 74

页数：13

共 50 条

[1] Addressing Function Approximation Error in Actor-Critic Methods
Fujimoto, Scott
van Hoof, Herke
Meger, David
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
[2] Actor-critic algorithms
Konda, VR
Tsitsiklis, JN
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014
[3] Natural Actor-Critic
Peters, Jan
Schaal, Stefan
[J]. NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
[4] On actor-critic algorithms
Konda, VR
Tsitsiklis, JN
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
[5] Natural Actor-Critic
Peters, J
Vijayakumar, S
Schaal, S
[J]. MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 280 - 291
[6] An Actor-Critic Algorithm With Second-Order Actor and Critic
Wang, Jing
Paschalidis, Ioannis Ch.
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) : 2689 - 2703
[7] Variational actor-critic algorithms*,**
Zhu, Yuhua
Ying, Lexing
[J]. ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2023, 29
[8] A Hessian Actor-Critic Algorithm
Wang, Jing
Paschalidis, Ioannis Ch
[J]. 2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 1131 - 1136
[9] Actor-Critic Instance Segmentation
Araslanov, Nikita
Rothkopf, Constantin A.
Roth, Stefan
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8229 - 8238
[10] Natural actor-critic algorithms
Bhatnagar, Shalabh
Sutton, Richard S.
Ghavamzadeh, Mohammad
Lee, Mark
[J]. AUTOMATICA, 2009, 45 (11) : 2471 - 2482

← 1 2 3 4 5 →