Backward Q-learning: The combination of Sarsa algorithm and Q-learning

被引:104
|
作者
Wang, Yin-Hao [1 ]
Li, Tzuu-Hseng S. [1 ]
Lin, Chih-Jui [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Elect Engn, AiRobots Lab, Tainan 70101, Taiwan
关键词
Backward Q-learning; Q-learning; Reinforcement learning; Sarsa algorithm; NEURAL-NETWORK; FUZZY CONTROLLER; REINFORCEMENT; DESIGN; SYSTEM; LOGIC;
D O I
10.1016/j.engappai.2013.06.016
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning (RI) has been applied to many fields and applications, but there are still some dilemmas between exploration and exploitation strategy for action selection policy. The well-known areas of reinforcement learning are the Q-learning and the Sarsa algorithms, but they possess different characteristics. Generally speaking, the Sarsa algorithm has faster convergence characteristics, while the Q-learning algorithm has a better final performance. However, Sarsa algorithm is easily stuck in the local minimum and Q-learning needs longer time to learn. Most literatures investigated the action selection policy. Instead of studying an action selection strategy, this paper focuses on how to combine Q-learning with the Sarsa algorithm, and presents a new method, called backward Q-learning, which can be implemented in the Sarsa algorithm and Q-learning. The backward Q-learning algorithm directly tunes the Q-values, and then the Q-values will indirectly affect the action selection policy. Therefore, the proposed RL algorithms can enhance learning speed and improve final performance. Finally, three experimental results including cliff walk, mountain car, and cart-pole balancing control system are utilized to verify the feasibility and effectiveness of the proposed scheme. All the simulations illustrate that the backward Q-learning based RL algorithm outperforms the well-known Q-learning and the Sarsa algorithm. (C) 2013 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2184 / 2193
页数:10
相关论文
共 50 条
  • [1] Comparison of Deep Q-Learning, Q-Learning and SARSA Reinforced Learning for Robot Local Navigation
    Anas, Hafiq
    Ong, Wee Hong
    Malik, Owais Ahmed
    [J]. ROBOT INTELLIGENCE TECHNOLOGY AND APPLICATIONS 6, 2022, 429 : 443 - 454
  • [2] Deep Reinforcement Learning with Sarsa and Q-Learning: A Hybrid Approach
    Xu, Zhi-xiong
    Cao, Lei
    Chen, Xi-liang
    Li, Chen-xi
    Zhang, Yong-liang
    Lai, Jun
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2018, E101D (09) : 2315 - 2322
  • [3] Coherent beam combination based on Q-learning algorithm
    Zhang, Xi
    Li, Pingxue
    Zhu, Yunchen
    Li, Chunyong
    Yao, Chuanfei
    Wang, Luo
    Dong, Xueyan
    Li, Shun
    [J]. OPTICS COMMUNICATIONS, 2021, 490
  • [4] Q-LEARNING
    WATKINS, CJCH
    DAYAN, P
    [J]. MACHINE LEARNING, 1992, 8 (3-4) : 279 - 292
  • [5] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
    Tan, Fuxiao
    Yan, Pengfei
    Guan, Xinping
    [J]. NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
  • [6] ENHANCEMENTS OF FUZZY Q-LEARNING ALGORITHM
    Glowaty, Grzegorz
    [J]. COMPUTER SCIENCE-AGH, 2005, 7 : 77 - 87
  • [7] An analysis of the pheromone Q-learning algorithm
    Monekosso, N
    Remagnino, P
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2002, PROCEEDINGS, 2002, 2527 : 224 - 232
  • [8] An improved immune Q-learning algorithm
    Ji, Zhengqiao
    Wu, Q. M. Jonathan
    Sid-Ahmed, Maher
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOLS 1-8, 2007, : 3330 - +
  • [9] Mounting of auction agent under dynamic environment by Q-learning and SARSA learning
    Katou, T
    Nagasaka, K
    [J]. 7TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL V, PROCEEDINGS: COMPUTER SCIENCE AND ENGINEERING: I, 2003, : 472 - 475
  • [10] Constrained Deep Q-Learning Gradually Approaching Ordinary Q-Learning
    Ohnishi, Shota
    Uchibe, Eiji
    Yamaguchi, Yotaro
    Nakanishi, Kosuke
    Yasui, Yuji
    Ishii, Shin
    [J]. FRONTIERS IN NEUROROBOTICS, 2019, 13