Convergence of Multiagent Q-learning: Multi Action Replay Process Approach

被引:6
|
作者
Kim, Han-Eol [1 ]
Ahn, Hyo-Sung [1 ]
机构
[1] Gwangju Inst Sci & Technol, Grad Sch Mechatron, Distributed Control & Autonomous Syst Lab, Kwangju, South Korea
关键词
D O I
10.1109/ISIC.2010.5612911
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we first suggest a new type of Markov model extended by Watkins' action replay process [1]. The new Markov model is called multi-action replay process (MARP), which is a process designed for multiagent coordination on the basis of reward values, state transition probabilities, and equilibrium strategy taking account of joint-action among agents. Using this model, multiagent Q-learning algorithm is then constructed as a cooperative reinforcement learning algorithm under completely connected agents. Finally, we prove that multiagent Q-learning values converge to optimal values. Simulation results are reported to illustrate the validity of the proposed multiagent Q-learning algorithm.
引用
收藏
页码:789 / 794
页数:6
相关论文
共 50 条
  • [1] A,Multiagent approach to Q-learning for daily stock trading
    Lee, Jae Won
    Park, Jonghun
    O, Jangmin
    Lee, Jongwoo
    Hong, Euyseok
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (06): : 864 - 877
  • [2] Multiagent coordination utilising Q-learning
    Patnaik, Srikanta
    Mahalik, N. P.
    [J]. INTERNATIONAL JOURNAL OF AUTOMATION AND CONTROL, 2007, 1 (04) : 361 - 379
  • [3] Q-learning with Experience Replay in a Dynamic Environment
    Pieters, Mathijs
    Wiering, Marco A.
    [J]. PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2016,
  • [4] Improved Fuzzy Q-Learning with Replay Memory
    Li, Xin
    Cohen, Kelly
    [J]. FUZZY INFORMATION PROCESSING 2020, 2022, 1337 : 13 - 23
  • [5] Graph Exploration for Effective Multiagent Q-Learning
    Zhaikhan, Ainur
    Sayed, Ali H.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [6] Comparing Multi-Armed Bandit Algorithms and Q-learning for Multiagent Action Selection: a Case Study in Route Choice
    de Oliveira, Thiago B. F.
    Bazzan, Ana L. C.
    da Silva, Bruno C.
    Grunitzki, Ricardo
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [7] Convergence of optimistic and incremental Q-learning
    Even-Dar, E
    Mansour, Y
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1499 - 1506
  • [8] Multiagent Q-learning with Sub-Team Coordination
    Huang, Wenhan
    Li, Kai
    Shao, Kun
    Zhou, Tianze
    Taylor, Matthew E.
    Luo, Jun
    Wang, Dongge
    Mao, Hangyu
    Hao, Jianye
    Wang, Jun
    Deng, Xiaotie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Final Iteration Convergence Bound of Q-Learning: Switching System Approach
    Lee, Donghwan
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (07) : 4765 - 4772
  • [10] A Multiagent Dynamic Assessment Approach for Water Quality Based on Improved Q-Learning Algorithm
    Ni, Jianjun
    Ren, Li
    Liu, Minghua
    Zhu, Daqi
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2013, 2013