Convergence of Multiagent Q-learning: Multi Action Replay Process Approach

被引：6

作者：

Kim, Han-Eol ^{[1
]}

Ahn, Hyo-Sung ^{[1
]}

机构：

[1] Gwangju Inst Sci & Technol, Grad Sch Mechatron, Distributed Control & Autonomous Syst Lab, Kwangju, South Korea

来源：

2010 IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT CONTROL | 2010年

关键词：

D O I：

10.1109/ISIC.2010.5612911

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we first suggest a new type of Markov model extended by Watkins' action replay process [1]. The new Markov model is called multi-action replay process (MARP), which is a process designed for multiagent coordination on the basis of reward values, state transition probabilities, and equilibrium strategy taking account of joint-action among agents. Using this model, multiagent Q-learning algorithm is then constructed as a cooperative reinforcement learning algorithm under completely connected agents. Finally, we prove that multiagent Q-learning values converge to optimal values. Simulation results are reported to illustrate the validity of the proposed multiagent Q-learning algorithm.

引用

页码：789 / 794

页数：6

共 50 条

[1] A,Multiagent approach to Q-learning for daily stock trading
Lee, Jae Won
Park, Jonghun
O, Jangmin
Lee, Jongwoo
Hong, Euyseok
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (06): : 864 - 877
[2] Multiagent coordination utilising Q-learning
Patnaik, Srikanta
Mahalik, N. P.
[J]. INTERNATIONAL JOURNAL OF AUTOMATION AND CONTROL, 2007, 1 (04) : 361 - 379
[3] Q-learning with Experience Replay in a Dynamic Environment
Pieters, Mathijs
Wiering, Marco A.
[J]. PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
[4] Improved Fuzzy Q-Learning with Replay Memory
Li, Xin
Cohen, Kelly
[J]. FUZZY INFORMATION PROCESSING 2020, 2022, 1337 : 13 - 23
[5] Graph Exploration for Effective Multiagent Q-Learning
Zhaikhan, Ainur
Sayed, Ali H.
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[6] Comparing Multi-Armed Bandit Algorithms and Q-learning for Multiagent Action Selection: a Case Study in Route Choice
de Oliveira, Thiago B. F.
Bazzan, Ana L. C.
da Silva, Bruno C.
Grunitzki, Ricardo
[J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[7] Convergence of optimistic and incremental Q-learning
Even-Dar, E
Mansour, Y
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1499 - 1506
[8] Multiagent Q-learning with Sub-Team Coordination
Huang, Wenhan
Li, Kai
Shao, Kun
Zhou, Tianze
Taylor, Matthew E.
Luo, Jun
Wang, Dongge
Mao, Hangyu
Hao, Jianye
Wang, Jun
Deng, Xiaotie
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[9] Final Iteration Convergence Bound of Q-Learning: Switching System Approach
Lee, Donghwan
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (07) : 4765 - 4772
[10] A Multiagent Dynamic Assessment Approach for Water Quality Based on Improved Q-Learning Algorithm
Ni, Jianjun
Ren, Li
Liu, Minghua
Zhu, Daqi
[J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2013, 2013

← 1 2 3 4 5 →