Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引:0
|
作者
Baturay Saglam
Furkan Burak Mutlu
Dogan Can Cicek
Suleyman Serdar Kozat
机构
[1] Bilkent University,Department of Electrical and Electronics Engineering
来源
关键词
Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;
D O I
暂无
中图分类号
学科分类号
摘要
Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.
引用
收藏
相关论文
共 50 条
  • [41] Deep attributed graph clustering with self-separation regularization and parameter-free cluster estimation
    Ji, Junzhong
    Liang, Ye
    Lei, Minglong
    NEURAL NETWORKS, 2021, 142 : 522 - 533
  • [42] The Primacy Bias in Deep Reinforcement Learning
    Nikishin, Evgenii
    Schwarzer, Max
    D'Oro, Pierluca
    Bacon, Pierre-Luc
    Courville, Aaron
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [43] Parameter-free Small Variance Asymptotics for Dictionary Learning
    Dang, Hong-Phuong
    Elvira, Clement
    2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2019,
  • [44] Parameter-Free Online Learning via Model Selection
    Foster, Dylan J.
    Kale, Satyen
    Mohri, Mehryar
    Sridharan, Karthik
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [45] Reinforcement Learning Control with Deep Deterministic Policy Gradient Algorithm for Multivariable pH Process
    Panjapornpon, Chanin
    Chinchalongporn, Patcharapol
    Bardeeniz, Santi
    Makkayatorn, Ratthanita
    Wongpunnawat, Witchaya
    PROCESSES, 2022, 10 (12)
  • [46] Parameter-Free Extreme Learning Machine for Imbalanced Classification
    Li, Li
    Zhao, Kaiyi
    Sun, Ruizhi
    Gan, Jiangzhang
    Yuan, Gang
    Liu, Tong
    NEURAL PROCESSING LETTERS, 2020, 52 (03) : 1927 - 1944
  • [47] Deep Deterministic Policy Gradient to Regulate Feedback Control Systems Using Reinforcement Learning
    Arshad, Jehangir
    Khan, Ayesha
    Aftab, Mariam
    Hussain, Mujtaba
    Rehman, Ateeq Ur
    Ahmad, Shafiq
    Al-Shayea, Adel M.
    Shafiq, Muhammad
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 71 (01): : 1153 - 1169
  • [48] Independent Deep Deterministic Policy Gradient Reinforcement Learning in Cooperative Multiagent Pursuit Games
    Zhou, Shiyang
    Ren, Weiya
    Ren, Xiaoguang
    Wang, Yanzhen
    Yi, Xiaodong
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 625 - 637
  • [49] Reinforcement learning of motor skills with policy gradients
    Peters, Jan
    Schaal, Stefan
    NEURAL NETWORKS, 2008, 21 (04) : 682 - 697
  • [50] Improvement of PMSM Control Using Reinforcement Learning Deep Deterministic Policy Gradient Agent
    Nicola, Marcel
    Nicola, Claudiu-Ionel
    2021 21ST INTERNATIONAL SYMPOSIUM ON POWER ELECTRONICS (EE 2021), 2021,