Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引：0

作者：

Baturay Saglam

Furkan Burak Mutlu

Dogan Can Cicek

Suleyman Serdar Kozat

机构：

[1] Bilkent University,Department of Electrical and Electronics Engineering

来源：

Neural Processing Letters | / 56卷

关键词：

Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.

引用

共 50 条

[1] Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients
Saglam, Baturay
Mutlu, Furkan Burak
Cicek, Dogan Can
Kozat, Suleyman Serdar
NEURAL PROCESSING LETTERS, 2024, 56 (02)
[2] Parameter-Free On-line Deep Learning
Wawrzynski, Pawel
AUTOMATION 2017: INNOVATIONS IN AUTOMATION, ROBOTICS AND MEASUREMENT TECHNIQUES, 2017, 550 : 543 - 553
[3] Learning to Pour using Deep Deterministic Policy Gradients
Do, Chau
Gordillo, Camilo
Burgard, Wolfram
2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 3074 - 3079
[4] Parameter-free Locally Accelerated Conditional Gradients
Carderera, Alejandro
Diakonikolas, Jelena
Lin, Cheuk Yin
Pokutta, Sebastian
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[5] Automatic VMAT Machine Parameter Optimization Using Deep Deterministic Policy Gradients
Hrinivich, W.
Li, H.
Lee, J.
MEDICAL PHYSICS, 2022, 49 (06) : E117 - E117
[6] Deep Deterministic Policy Gradients with Transfer Learning Framework in StarCraft Micromanagement
Xie, Dong
Zhong, Xiangnan
2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2019, : 410 - 415
[7] Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
Wu, Junta
Li, Huiyun
MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020 (2020)
[8] Selective Catalytic Reduction System Ammonia Injection Control Based on Deep Deterministic Policy Reinforcement Learning
Xie, Peiran
Zhang, Guangming
Niu, Yuguang
Sun, Tianshu
FRONTIERS IN ENERGY RESEARCH, 2021, 9
[9] A parameter-free learning automaton scheme
Ren, Xudie
Li, Shenghong
Ge, Hao
FRONTIERS IN NEUROROBOTICS, 2022, 16
[10] Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient
Zhan, Ming
Fan, Jingjing
Guo, Jianying
IEEE ACCESS, 2023, 11 : 87732 - 87746

← 1 2 3 4 5 →