A new Q-learning algorithm based on the Metropolis criterion

被引:101
|
作者
Guo, MZ [1 ]
Liu, Y
Malec, J
机构
[1] Harbin Inst Technol, Dept Comp Sci & Engn, Harbin 150001, Peoples R China
[2] Lund Univ, Dept Comp Sci, S-22100 Lund, Sweden
关键词
exploitation; exploration; metropolis criterion; Q-learning; reinforcement learning;
D O I
10.1109/TSMCB.2004.832154
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.
引用
收藏
页码:2140 / 2143
页数:4
相关论文
共 50 条
  • [1] Research on Q-learning algorithm based on Metropolis criterion
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2002, 39 (06):
  • [2] A Metropolis Criterion Based Fuzzy Q-Learning Flow Controller for High-Speed Networks
    Liu, Wenwei
    Li, Xin
    Qin, Xiaoning
    Yu, Dan
    [J]. INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS, PTS 1-4, 2013, 241-244 : 2327 - +
  • [3] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
    Wang, Yin-Hao
    Li, Tzuu-Hseng S.
    Lin, Chih-Jui
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
  • [4] An ARM-based Q-learning algorithm
    Hsu, Yuan-Pao
    Hwang, Kao-Shing
    Lin, Hsin-Yi
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF CONTEMPORARY INTELLIGENT COMPUTING TECHNIQUES, 2007, 2 : 11 - +
  • [5] A Kind of New Routing Algorithm with Adaptivity for Mobile IOT Based on Q-Learning
    一种基于Q-Learning策略的自适应移动物联网路由新算法
    [J]. Liu, Xiao-Huan (815215568@qq.com), 2018, Chinese Institute of Electronics (46):
  • [6] A Task Scheduling Algorithm Based on Q-Learning for WSNs
    Zhang, Benhong
    Wu, Wensheng
    Bi, Xiang
    Wang, Yiming
    [J]. COMMUNICATIONS AND NETWORKING, CHINACOM 2018, 2019, 262 : 521 - 530
  • [7] Power Control Algorithm Based on Q-Learning in Femtocell
    Li Yun
    Tang Ying
    Liu Hanxiao
    [J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2019, 41 (11) : 2557 - 2564
  • [8] Q-Learning Algorithm Based on Incremental RBF Network
    Hu, Yanming
    Li, Decai
    He, Yuqing
    Han, Jianda
    [J]. Jiqiren/Robot, 2019, 41 (05): : 562 - 573
  • [9] Coherent beam combination based on Q-learning algorithm
    Zhang, Xi
    Li, Pingxue
    Zhu, Yunchen
    Li, Chunyong
    Yao, Chuanfei
    Wang, Luo
    Dong, Xueyan
    Li, Shun
    [J]. OPTICS COMMUNICATIONS, 2021, 490
  • [10] Adaptive PID controller based on Q-learning algorithm
    Shi, Qian
    Lam, Hak-Keung
    Xiao, Bo
    Tsai, Shun-Hung
    [J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2018, 3 (04) : 235 - 244