Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引：0

作者：

Baturay Saglam

Furkan Burak Mutlu

Dogan Can Cicek

Suleyman Serdar Kozat

机构：

[1] Bilkent University,Department of Electrical and Electronics Engineering

来源：

Neural Processing Letters | / 56卷

关键词：

Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.

引用

共 50 条

[11] Softmax Deep Double Deterministic Policy Gradients
Pan, Ling
Cai, Qingpeng
Hang, Longbo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[12] Parameter-free rendering of single-molecule localization microscopy data for parameter-free resolution estimation
Descloux, Adrien C.
Grussmayer, Kristin S.
Radenovic, Aleksandra
COMMUNICATIONS BIOLOGY, 2021, 4 (01)
[13] Parameter-free rendering of single-molecule localization microscopy data for parameter-free resolution estimation
Adrien C. Descloux
Kristin S. Grußmayer
Aleksandra Radenovic
Communications Biology, 4
[14] Parameter-Free Loss for Class-Imbalanced Deep Learning in Image Classification
Du, Jie
Zhou, Yanhong
Liu, Peng
Vong, Chi-Man
Wang, Tianfu
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (06) : 3234 - 3240
[15] Parameter estimation in quantum sensing based on deep reinforcement learning
Tailong Xiao
Jianping Fan
Guihua Zeng
npj Quantum Information, 8
[16] Parameter estimation in quantum sensing based on deep reinforcement learning
Xiao, Tailong
Fan, Jianping
Zeng, Guihua
NPJ QUANTUM INFORMATION, 2022, 8 (01)
[17] Value activation for bias alleviation: Generalized-activated deep double deterministic policy gradients
Lyu, Jiafei
Yang, Yu
Yan, Jiangpeng
Li, Xiu
NEUROCOMPUTING, 2023, 518 : 70 - 81
[18] Expected Policy Gradients for Reinforcement Learning
Ciosek, Kamil
Whiteson, Shimon
JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
[19] Expected policy gradients for reinforcement learning
Ciosek, Kamil
Whiteson, Shimon
Journal of Machine Learning Research, 2020, 21
[20] An Efficient Parameter-Free Learning Automaton Scheme
Di, Chong
Liang, Qilian
Li, Fangqi
Li, Shenghong
Luo, Fucai
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (11) : 4849 - 4863

← 1 2 3 4 5 →