Error controlled actor-critic

被引:2
|
作者
Gao, Xingen [1 ]
Chao, Fei [2 ,5 ]
Zhou, Changle [2 ]
Ge, Zhen [3 ]
Yang, Longzhi [4 ]
Chang, Xiang [5 ]
Shang, Changjing [5 ]
Shen, Qiang [5 ]
机构
[1] Xiamen Univ Technol, Sch Optoelect & Commun Engn, Xiamen 361024, Peoples R China
[2] Xiamen Univ, Sch Informat, Dept Artificial Intelligence, Xiamen 361005, Peoples R China
[3] Univ Technol Sydney, Fac Engn & Informat Technol, Ultimo, NSW, Australia
[4] Northumbria Univ, Comp Sci & Digital Technol Dept, Newcastle upon Tyne, England
[5] Aberystwyth Univ, Inst Math Phys & Comp Sci, Dept Comp Sci, Aberystwyth SY23 3DB, Wales
关键词
Reinforcement learning; Actor-critic; Approximation error; Overestimation; KL-divergence;
D O I
10.1016/j.ins.2022.08.079
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The approximation inaccuracy of the value function in reinforcement learning (RL) algo-rithms unavoidably leads to an overestimation phenomenon, which has negative effects on the convergence of the algorithms. To limit the negative effects of the approximation error, we propose error controlled actor-critic (ECAC) which ensures the approximation error is limited within the value function. We present an investigation of how approxima-tion inaccuracy can impair the optimization process of actor-critic approaches. In addition, we derive an upper bound for the approximation error of the Q function approximator and discover that the error can be reduced by limiting the KL-divergence between every two consecutive policies during policy training. Experiments on a variety of continuous control tasks demonstrate that the proposed actor-critic approach decreases approximation error and outperforms previous model-free RL algorithms by a significant margin.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:62 / 74
页数:13
相关论文
共 50 条
  • [1] Addressing Function Approximation Error in Actor-Critic Methods
    Fujimoto, Scott
    van Hoof, Herke
    Meger, David
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [2] Actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 12, 2000, 12 : 1008 - 1014
  • [3] Natural Actor-Critic
    Peters, Jan
    Schaal, Stefan
    [J]. NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
  • [4] On actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
  • [5] Natural Actor-Critic
    Peters, J
    Vijayakumar, S
    Schaal, S
    [J]. MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 : 280 - 291
  • [6] An Actor-Critic Algorithm With Second-Order Actor and Critic
    Wang, Jing
    Paschalidis, Ioannis Ch.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (06) : 2689 - 2703
  • [7] Variational actor-critic algorithms*,**
    Zhu, Yuhua
    Ying, Lexing
    [J]. ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2023, 29
  • [8] A Hessian Actor-Critic Algorithm
    Wang, Jing
    Paschalidis, Ioannis Ch
    [J]. 2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 1131 - 1136
  • [9] Actor-Critic Instance Segmentation
    Araslanov, Nikita
    Rothkopf, Constantin A.
    Roth, Stefan
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8229 - 8238
  • [10] Natural actor-critic algorithms
    Bhatnagar, Shalabh
    Sutton, Richard S.
    Ghavamzadeh, Mohammad
    Lee, Mark
    [J]. AUTOMATICA, 2009, 45 (11) : 2471 - 2482