Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引:0
|
作者
Baturay Saglam
Furkan Burak Mutlu
Dogan Can Cicek
Suleyman Serdar Kozat
机构
[1] Bilkent University,Department of Electrical and Electronics Engineering
来源
关键词
Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;
D O I
暂无
中图分类号
学科分类号
摘要
Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.
引用
收藏
相关论文
共 50 条
  • [31] Deep parameter-free attention hashing for image retrieval
    Yang, Wenjing
    Wang, Liejun
    Cheng, Shuli
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [32] PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces
    Lucien F. Krapp
    Luciano A. Abriata
    Fabio Cortés Rodriguez
    Matteo Dal Peraro
    Nature Communications, 14
  • [33] A Self-Adaptive Vibration Reduction Method Based on Deep Deterministic Policy Gradient (DDPG) Reinforcement Learning Algorithm
    Jin, Xin
    Ma, Hongbao
    Kang, Yihua
    APPLIED SCIENCES-BASEL, 2022, 12 (19):
  • [34] A parameter-free graph reduction for spectral clustering and SpectralNet
    Alshammari, Mashaan
    Stavrakakis, John
    Takatsuka, Masahiro
    ARRAY, 2022, 15
  • [35] Temporal Parameter-Free Deep Skinning of Animated Meshes
    Moutafidou, Anastasia
    Toulatzis, Vasileios
    Fudos, Ioannis
    ADVANCES IN COMPUTER GRAPHICS, CGI 2021, 2021, 13002 : 3 - 24
  • [36] Alleviating the estimation bias of deep deterministic policy gradient via co-regularization
    Li, Yao
    Wang, YuHui
    Gan, YaoZhong
    Tan, XiaoYang
    PATTERN RECOGNITION, 2022, 131
  • [37] Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient
    Wu, Dongming
    Dong, Xingping
    Shen, Jianbing
    Hoi, Steven C. H.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) : 4933 - 4945
  • [38] PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces
    Krapp, Lucien F.
    Abriata, Luciano A.
    Rodriguez, Fabio Cortes
    Dal Peraro, Matteo
    NATURE COMMUNICATIONS, 2023, 14 (01)
  • [39] Deep attributed graph clustering with self-separation regularization and parameter-free cluster estimation
    Ji, Junzhong
    Liang, Ye
    Lei, Minglong
    Lei, Minglong (leiml@bjut.edu.cn), 1600, Elsevier Ltd (142): : 522 - 533
  • [40] BIAS REDUCTION IN PARAMETER-ESTIMATION
    LE, LX
    WILSON, WJ
    AUTOMATICA, 1988, 24 (06) : 825 - 828