Two mode Q-learning

被引:0
|
作者
Park, KH [1 ]
Kim, JH [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Elect Engn & Comp Sci, Taejon 305701, South Korea
关键词
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In this paper, a new two mode Q-learning using both the success and failure experiences of an agent is proposed for the fast convergence, which extends Q-learning, a well-known scheme used for reinforcement learning. In the Q-learning, if the agent enters into the "fail" state, it receives a punishment from environment. By this punishment, the Q value of the action which generated the failure experience is decreased. On the other hand, the proposed two mode Q-learning is based on both the normal and failure Q values for the selection of the action in a state-action space. To determine the failure Q value using the previous failure experience of the agent, it employs a failure Q value module. To demonstrate the effectiveness of the proposed method, it is compared with the conventional Q-learning in a goalie system to perform goalkeeping in robot soccer.
引用
收藏
页码:2449 / 2454
页数:6
相关论文
共 50 条
  • [1] Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants
    Schilperoort, Jits
    Mak, Ivar
    Drugan, Madalina M.
    Wiering, Marco A.
    [J]. 2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1151 - 1158
  • [2] Q-LEARNING
    WATKINS, CJCH
    DAYAN, P
    [J]. MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
  • [3] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [4] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [5] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
    Ohnishi, Shota
    Uchibe, Eiji
    Yamaguchi, Yotaro
    Nakanishi, Kosuke
    Yasui, Yuji
    Ishii, Shin
    [J]. FRONTIERS IN NEUROROBOTICS, 2019, 13
  • [6] Learning rates for Q-Learning
    Even-Dar, E
    Mansour, Y
    [J]. COMPUTATIONAL LEARNING THEORY, PROCEEDINGS, 2001, 2111 : 589 - 604
  • [7] Learning rates for Q-learning
    Even-Dar, E
    Mansour, Y
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
  • [8] Contextual Q-Learning
    Pinto, Tiago
    Vale, Zita
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2927 - 2928
  • [9] CVaR Q-Learning
    Stanko, Silvestr
    Macek, Karel
    [J]. COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 : 333 - 358
  • [10] Bayesian Q-learning
    Dearden, R
    Friedman, N
    Russell, S
    [J]. FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 761 - 768