A new Q-learning algorithm based on the Metropolis criterion

被引：101

作者：

Guo, MZ ^{[1
]}

Liu, Y

Malec, J

机构：

[1] Harbin Inst Technol, Dept Comp Sci & Engn, Harbin 150001, Peoples R China

[2] Lund Univ, Dept Comp Sci, S-22100 Lund, Sweden

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2004年 / 34卷 / 05期

关键词：

exploitation; exploration; metropolis criterion; Q-learning; reinforcement learning;

D O I：

10.1109/TSMCB.2004.832154

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is de scribed as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.

引用

页码：2140 / 2143

页数：4

共 50 条

[1] Research on Q-learning algorithm based on Metropolis criterion
[J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2002, 39 (06):
[2] A Metropolis Criterion Based Fuzzy Q-Learning Flow Controller for High-Speed Networks
Liu, Wenwei
Li, Xin
Qin, Xiaoning
Yu, Dan
[J]. INDUSTRIAL INSTRUMENTATION AND CONTROL SYSTEMS, PTS 1-4, 2013, 241-244 : 2327 - +
[3] Backward Q-learning: The combination of Sarsa algorithm and Q-learning
Wang, Yin-Hao
Li, Tzuu-Hseng S.
Lin, Chih-Jui
[J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (09) : 2184 - 2193
[4] An ARM-based Q-learning algorithm
Hsu, Yuan-Pao
Hwang, Kao-Shing
Lin, Hsin-Yi
[J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF CONTEMPORARY INTELLIGENT COMPUTING TECHNIQUES, 2007, 2 : 11 - +
[5] A Kind of New Routing Algorithm with Adaptivity for Mobile IOT Based on Q-Learning
一种基于Q-Learning策略的自适应移动物联网路由新算法
[J]. Liu, Xiao-Huan (815215568@qq.com), 2018, Chinese Institute of Electronics (46):
[6] A Task Scheduling Algorithm Based on Q-Learning for WSNs
Zhang, Benhong
Wu, Wensheng
Bi, Xiang
Wang, Yiming
[J]. COMMUNICATIONS AND NETWORKING, CHINACOM 2018, 2019, 262 : 521 - 530
[7] Power Control Algorithm Based on Q-Learning in Femtocell
Li Yun
Tang Ying
Liu Hanxiao
[J]. JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2019, 41 (11) : 2557 - 2564
[8] Q-Learning Algorithm Based on Incremental RBF Network
Hu, Yanming
Li, Decai
He, Yuqing
Han, Jianda
[J]. Jiqiren/Robot, 2019, 41 (05): : 562 - 573
[9] Coherent beam combination based on Q-learning algorithm
Zhang, Xi
Li, Pingxue
Zhu, Yunchen
Li, Chunyong
Yao, Chuanfei
Wang, Luo
Dong, Xueyan
Li, Shun
[J]. OPTICS COMMUNICATIONS, 2021, 490
[10] Adaptive PID controller based on Q-learning algorithm
Shi, Qian
Lam, Hak-Keung
Xiao, Bo
Tsai, Shun-Hung
[J]. CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2018, 3 (04) : 235 - 244

← 1 2 3 4 5 →