Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引：0

作者：

Baturay Saglam

Furkan Burak Mutlu

Dogan Can Cicek

Suleyman Serdar Kozat

机构：

[1] Bilkent University,Department of Electrical and Electronics Engineering

来源：

Neural Processing Letters | / 56卷

关键词：

Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.

引用

共 50 条

[31] Deep parameter-free attention hashing for image retrieval
Yang, Wenjing
Wang, Liejun
Cheng, Shuli
SCIENTIFIC REPORTS, 2022, 12 (01)
[32] PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces
Lucien F. Krapp
Luciano A. Abriata
Fabio Cortés Rodriguez
Matteo Dal Peraro
Nature Communications, 14
[33] A Self-Adaptive Vibration Reduction Method Based on Deep Deterministic Policy Gradient (DDPG) Reinforcement Learning Algorithm
Jin, Xin
Ma, Hongbao
Kang, Yihua
APPLIED SCIENCES-BASEL, 2022, 12 (19):
[34] A parameter-free graph reduction for spectral clustering and SpectralNet
Alshammari, Mashaan
Stavrakakis, John
Takatsuka, Masahiro
ARRAY, 2022, 15
[35] Temporal Parameter-Free Deep Skinning of Animated Meshes
Moutafidou, Anastasia
Toulatzis, Vasileios
Fudos, Ioannis
ADVANCES IN COMPUTER GRAPHICS, CGI 2021, 2021, 13002 : 3 - 24
[36] Alleviating the estimation bias of deep deterministic policy gradient via co-regularization
Li, Yao
Wang, YuHui
Gan, YaoZhong
Tan, XiaoYang
PATTERN RECOGNITION, 2022, 131
[37] Reducing Estimation Bias via Triplet-Average Deep Deterministic Policy Gradient
Wu, Dongming
Dong, Xingping
Shen, Jianbing
Hoi, Steven C. H.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (11) : 4933 - 4945
[38] PeSTo: parameter-free geometric deep learning for accurate prediction of protein binding interfaces
Krapp, Lucien F.
Abriata, Luciano A.
Rodriguez, Fabio Cortes
Dal Peraro, Matteo
NATURE COMMUNICATIONS, 2023, 14 (01)
[39] Deep attributed graph clustering with self-separation regularization and parameter-free cluster estimation
Ji, Junzhong
Liang, Ye
Lei, Minglong
Lei, Minglong (leiml@bjut.edu.cn), 1600, Elsevier Ltd (142): : 522 - 533
[40] BIAS REDUCTION IN PARAMETER-ESTIMATION
LE, LX
WILSON, WJ
AUTOMATICA, 1988, 24 (06) : 825 - 828

← 1 2 3 4 5 →