Underestimation estimators to Q-learning

被引:4
|
作者
Abliz, Patigul [1 ]
Ying, Shi [1 ]
机构
[1] Wuhan Univ, Comp Sci Sch, Wuhan, Peoples R China
关键词
Q-learning; Double Q-learning; Overestimation Bias reduction; Maximum estimator; Cross-validation; Underestimation reduction;
D O I
10.1016/j.ins.2022.05.090
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Q-learning (QL) is a popular method for control problems, which approximates the maximum expected action value using the maximum estimated action value, thus it suffers from positive overestimation bias. Various algorithms have been proposed to reduce overestimation bias. But some of these methods cause underestimation bias. Furtherly, about underestimation, which kind of estimators causes underestimation is less understood. In this paper, instead of a certain specific method, we focus on underestimation estimators, especially on K estimates of the action values. We generalize these estimators to propose an Underestimation Estimator Set (UES) and theoretically prove all of the estimators in this set suffer from underestimation bias. We further study the bias properties of these estimators and conclude that their biases are different from each other's and depend on the specific conditions they meet. Thus, our set provides various estimators for QL in different settings. Finally, to better illustrate the properties of estimators, we test the performance of several estimators in our set. Empirical results show that Median estimator (Me) underestimates less than double Q-learning (DQL) and doesn't suffer overestimation as QL, and Min estimator (M1E) underestimates more than DQL. Besides, Me and M1E perform as well as or better than other estimators on some benchmark environments.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:173 / 185
页数:13
相关论文
共 50 条
  • [1] Q-LEARNING
    WATKINS, CJCH
    DAYAN, P
    [J]. MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
  • [2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [3] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [4] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
    Ohnishi, Shota
    Uchibe, Eiji
    Yamaguchi, Yotaro
    Nakanishi, Kosuke
    Yasui, Yuji
    Ishii, Shin
    [J]. FRONTIERS IN NEUROROBOTICS, 2019, 13
  • [5] Learning rates for Q-learning
    Even-Dar, E
    Mansour, Y
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
  • [6] Learning rates for Q-Learning
    Even-Dar, E
    Mansour, Y
    [J]. COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
  • [7] Contextual Q-Learning
    Pinto, Tiago
    Vale, Zita
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2927 - 2928
  • [8] CVaR Q-Learning
    Stanko, Silvestr
    Macek, Karel
    [J]. COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 : 333 - 358
  • [9] Zap Q-Learning
    Devraj, Adithya M.
    Meyn, Sean P.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [10] Bayesian Q-learning
    Dearden, R
    Friedman, N
    Russell, S
    [J]. FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 761 - 768