Parameter-Free Reduction of the Estimation Bias in Deep Reinforcement Learning for Deterministic Policy Gradients

被引:0
|
作者
Baturay Saglam
Furkan Burak Mutlu
Dogan Can Cicek
Suleyman Serdar Kozat
机构
[1] Bilkent University,Department of Electrical and Electronics Engineering
来源
关键词
Deep reinforcement learning; Actor-critic methods; Estimation bias; Deterministic policy gradients; Continuous control;
D O I
暂无
中图分类号
学科分类号
摘要
Approximation of the value functions in value-based deep reinforcement learning induces overestimation bias, resulting in suboptimal policies. We show that when the reinforcement signals received by the agents have a high variance, deep actor-critic approaches that overcome the overestimation bias lead to a substantial underestimation bias. We first address the detrimental issues in the existing approaches that aim to overcome such underestimation error. Then, through extensive statistical analysis, we introduce a novel, parameter-free Deep Q-learning variant to reduce this underestimation bias in deterministic policy gradients. By sampling the weights of a linear combination of two approximate critics from a highly shrunk estimation bias interval, our Q-value update rule is not affected by the variance of the rewards received by the agents throughout learning. We test the performance of the introduced improvement on a set of MuJoCo and Box2D continuous control tasks and demonstrate that it outperforms the existing approaches and improves the baseline actor-critic algorithm in most of the environments tested.
引用
收藏
相关论文
共 50 条
  • [11] Softmax Deep Double Deterministic Policy Gradients
    Pan, Ling
    Cai, Qingpeng
    Hang, Longbo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [12] Parameter-free rendering of single-molecule localization microscopy data for parameter-free resolution estimation
    Descloux, Adrien C.
    Grussmayer, Kristin S.
    Radenovic, Aleksandra
    COMMUNICATIONS BIOLOGY, 2021, 4 (01)
  • [13] Parameter-free rendering of single-molecule localization microscopy data for parameter-free resolution estimation
    Adrien C. Descloux
    Kristin S. Grußmayer
    Aleksandra Radenovic
    Communications Biology, 4
  • [14] Parameter-Free Loss for Class-Imbalanced Deep Learning in Image Classification
    Du, Jie
    Zhou, Yanhong
    Liu, Peng
    Vong, Chi-Man
    Wang, Tianfu
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (06) : 3234 - 3240
  • [15] Parameter estimation in quantum sensing based on deep reinforcement learning
    Tailong Xiao
    Jianping Fan
    Guihua Zeng
    npj Quantum Information, 8
  • [16] Parameter estimation in quantum sensing based on deep reinforcement learning
    Xiao, Tailong
    Fan, Jianping
    Zeng, Guihua
    NPJ QUANTUM INFORMATION, 2022, 8 (01)
  • [17] Value activation for bias alleviation: Generalized-activated deep double deterministic policy gradients
    Lyu, Jiafei
    Yang, Yu
    Yan, Jiangpeng
    Li, Xiu
    NEUROCOMPUTING, 2023, 518 : 70 - 81
  • [18] Expected Policy Gradients for Reinforcement Learning
    Ciosek, Kamil
    Whiteson, Shimon
    JOURNAL OF MACHINE LEARNING RESEARCH, 2020, 21
  • [19] Expected policy gradients for reinforcement learning
    Ciosek, Kamil
    Whiteson, Shimon
    Journal of Machine Learning Research, 2020, 21
  • [20] An Efficient Parameter-Free Learning Automaton Scheme
    Di, Chong
    Liang, Qilian
    Li, Fangqi
    Li, Shenghong
    Luo, Fucai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (11) : 4849 - 4863