Underestimation estimators to Q-learning

被引：4

作者：

Abliz, Patigul ^{[1
]}

Ying, Shi ^{[1
]}

机构：

[1] Wuhan Univ, Comp Sci Sch, Wuhan, Peoples R China

来源：

INFORMATION SCIENCES | 2022年 / 607卷

关键词：

Q-learning; Double Q-learning; Overestimation Bias reduction; Maximum estimator; Cross-validation; Underestimation reduction;

D O I：

10.1016/j.ins.2022.05.090

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Q-learning (QL) is a popular method for control problems, which approximates the maximum expected action value using the maximum estimated action value, thus it suffers from positive overestimation bias. Various algorithms have been proposed to reduce overestimation bias. But some of these methods cause underestimation bias. Furtherly, about underestimation, which kind of estimators causes underestimation is less understood. In this paper, instead of a certain specific method, we focus on underestimation estimators, especially on K estimates of the action values. We generalize these estimators to propose an Underestimation Estimator Set (UES) and theoretically prove all of the estimators in this set suffer from underestimation bias. We further study the bias properties of these estimators and conclude that their biases are different from each other's and depend on the specific conditions they meet. Thus, our set provides various estimators for QL in different settings. Finally, to better illustrate the properties of estimators, we test the performance of several estimators in our set. Empirical results show that Median estimator (Me) underestimates less than double Q-learning (DQL) and doesn't suffer overestimation as QL, and Min estimator (M1E) underestimates more than DQL. Besides, Me and M1E perform as well as or better than other estimators on some benchmark environments.(c) 2022 Elsevier Inc. All rights reserved.

引用

页码：173 / 185

页数：13

共 50 条

[1] Q-LEARNING
WATKINS, CJCH
DAYAN, P
[J]. MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
[2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[3] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
[4] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
Ohnishi, Shota
Uchibe, Eiji
Yamaguchi, Yotaro
Nakanishi, Kosuke
Yasui, Yuji
Ishii, Shin
[J]. FRONTIERS IN NEUROROBOTICS, 2019, 13
[5] Learning rates for Q-learning
Even-Dar, E
Mansour, Y
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
[6] Learning rates for Q-Learning
Even-Dar, E
Mansour, Y
[J]. COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
[7] Contextual Q-Learning
Pinto, Tiago
Vale, Zita
[J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2927 - 2928
[8] CVaR Q-Learning
Stanko, Silvestr
Macek, Karel
[J]. COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 : 333 - 358
[9] Zap Q-Learning
Devraj, Adithya M.
Meyn, Sean P.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[10] Bayesian Q-learning
Dearden, R
Friedman, N
Russell, S
[J]. FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 761 - 768

← 1 2 3 4 5 →