Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引：0

作者：

Baturay Saglam

Furkan Burak Mutlu

Dogan Can Cicek

Suleyman Serdar Kozat

机构：

[1] Bilkent University,Department of Electrical and Electronics Engineering

来源：

Neural Processing Letters | / 56卷

关键词：

Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.

引用

共 50 条

[41] Deep attributed graph clustering with self-separation regularization and parameter-free cluster estimation
Ji, Junzhong
Liang, Ye
Lei, Minglong
NEURAL NETWORKS, 2021, 142 : 522 - 533
[42] The Primacy Bias in Deep Reinforcement Learning
Nikishin, Evgenii
Schwarzer, Max
D'Oro, Pierluca
Bacon, Pierre-Luc
Courville, Aaron
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[43] Parameter-free Small Variance Asymptotics for Dictionary Learning
Dang, Hong-Phuong
Elvira, Clement
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
[44] Parameter-Free Online Learning via Model Selection
Foster, Dylan J.
Kale, Satyen
Mohri, Mehryar
Sridharan, Karthik
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[45] Reinforcement Learning Control with Deep Deterministic Policy Gradient Algorithm for Multivariable pH Process
Panjapornpon, Chanin
Chinchalongporn, Patcharapol
Bardeeniz, Santi
Makkayatorn, Ratthanita
Wongpunnawat, Witchaya
PROCESSES, 2022, 10 (12)
[46] Parameter-Free Extreme Learning Machine for Imbalanced Classification
Li, Li
Zhao, Kaiyi
Sun, Ruizhi
Gan, Jiangzhang
Yuan, Gang
Liu, Tong
NEURAL PROCESSING LETTERS, 2020, 52 (03) : 1927 - 1944
[47] Deep Deterministic Policy Gradient to Regulate Feedback Control Systems Using Reinforcement Learning
Arshad, Jehangir
Khan, Ayesha
Aftab, Mariam
Hussain, Mujtaba
Rehman, Ateeq Ur
Ahmad, Shafiq
Al-Shayea, Adel M.
Shafiq, Muhammad
CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (01): : 1153 - 1169
[48] Independent Deep Deterministic Policy Gradient Reinforcement Learning in Cooperative Multiagent Pursuit Games
Zhou, Shiyang
Ren, Weiya
Ren, Xiaoguang
Wang, Yanzhen
Yi, Xiaodong
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 625 - 637
[49] Reinforcement learning of motor skills with policy gradients
Peters, Jan
Schaal, Stefan
NEURAL NETWORKS, 2008, 21 (04) : 682 - 697
[50] Improvement of PMSM Control Using Reinforcement Learning Deep Deterministic Policy Gradient Agent
Nicola, Marcel
Nicola, Claudiu-Ionel
2021 21ST INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS (EE 2021), 2021,

← 1 2 3 4 5 →