Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引:0
|
作者
Baturay Saglam
Furkan Burak Mutlu
Dogan Can Cicek
Suleyman Serdar Kozat
机构
[1] Bilkent University,Department of Electrical and Electronics Engineering
来源
关键词
Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;
D O I
暂无
中图分类号
学科分类号
摘要
Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.
引用
收藏
相关论文
共 50 条
  • [1] Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients
    Saglam, Baturay
    Mutlu, Furkan Burak
    Cicek, Dogan Can
    Kozat, Suleyman Serdar
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [2] Parameter-Free On-line Deep Learning
    Wawrzynski, Pawel
    AUTOMATION 2017: INNOVATIONS IN AUTOMATION, ROBOTICS AND MEASUREMENT TECHNIQUES, 2017, 550 : 543 - 553
  • [3] Learning to Pour using Deep Deterministic Policy Gradients
    Do, Chau
    Gordillo, Camilo
    Burgard, Wolfram
    2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 3074 - 3079
  • [4] Parameter-free Locally Accelerated Conditional Gradients
    Carderera, Alejandro
    Diakonikolas, Jelena
    Lin, Cheuk Yin
    Pokutta, Sebastian
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Automatic VMAT Machine Parameter Optimization Using Deep Deterministic Policy Gradients
    Hrinivich, W.
    Li, H.
    Lee, J.
    MEDICAL PHYSICS, 2022, 49 (06) : E117 - E117
  • [6] Deep Deterministic Policy Gradients with Transfer Learning Framework in StarCraft Micromanagement
    Xie, Dong
    Zhong, Xiangnan
    2019 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2019, : 410 - 415
  • [7] Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
    Wu, Junta
    Li, Huiyun
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020 (2020)
  • [8] Selective Catalytic Reduction System Ammonia Injection Control Based on Deep Deterministic Policy Reinforcement Learning
    Xie, Peiran
    Zhang, Guangming
    Niu, Yuguang
    Sun, Tianshu
    FRONTIERS IN ENERGY RESEARCH, 2021, 9
  • [9] A parameter-free learning automaton scheme
    Ren, Xudie
    Li, Shenghong
    Ge, Hao
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [10] Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient
    Zhan, Ming
    Fan, Jingjing
    Guo, Jianying
    IEEE ACCESS, 2023, 11 : 87732 - 87746