Two mode Q-learning

被引：0

作者：

Park, KH ^{[1
]}

Kim, JH ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Dept Elect Engn & Comp Sci, Taejon 305701, South Korea

来源：

CEC: 2003 CONGRESS ON EVOLUTIONARY COMPUTATION, VOLS 1-4, PROCEEDINGS | 2003年

关键词：

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In this paper, a new two mode Q-learning using both the success and failure experiences of an agent is proposed for the fast convergence, which extends Q-learning, a well-known scheme used for reinforcement learning. In the Q-learning, if the agent enters into the "fail" state, it receives a punishment from environment. By this punishment, the Q value of the action which generated the failure experience is decreased. On the other hand, the proposed two mode Q-learning is based on both the normal and failure Q values for the selection of the action in a state-action space. To determine the failure Q value using the previous failure experience of the agent, it employs a failure Q value module. To demonstrate the effectiveness of the proposed method, it is compared with the conventional Q-learning in a goalie system to perform goalkeeping in robot soccer.

引用

页码：2449 / 2454

页数：6

共 50 条

[1] Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants
Schilperoort, Jits
Mak, Ivar
Drugan, Madalina M.
Wiering, Marco A.
[J]. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1151 - 1158
[2] Q-LEARNING
WATKINS, CJCH
DAYAN, P
[J]. MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
[3] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
[J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[4] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
[5] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
Ohnishi, Shota
Uchibe, Eiji
Yamaguchi, Yotaro
Nakanishi, Kosuke
Yasui, Yuji
Ishii, Shin
[J]. FRONTIERS IN NEUROROBOTICS, 2019, 13
[6] Learning rates for Q-Learning
Even-Dar, E
Mansour, Y
[J]. COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
[7] Learning rates for Q-learning
Even-Dar, E
Mansour, Y
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
[8] Contextual Q-Learning
Pinto, Tiago
Vale, Zita
[J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2927 - 2928
[9] CVaR Q-Learning
Stanko, Silvestr
Macek, Karel
[J]. COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 : 333 - 358
[10] Bayesian Q-learning
Dearden, R
Friedman, N
Russell, S
[J]. FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 761 - 768

← 1 2 3 4 5 →