Convergence of Multiagent Q-learning: Multi Action Replay Process Approach

被引:6
|
作者
Kim, Han-Eol [1 ]
Ahn, Hyo-Sung [1 ]
机构
[1] Gwangju Inst Sci & Technol, Grad Sch Mechatron, Distributed Control & Autonomous Syst Lab, Kwangju, South Korea
关键词
D O I
10.1109/ISIC.2010.5612911
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we first suggest a new type of Markov model extended by Watkins' action replay process [1]. The new Markov model is called multi-action replay process (MARP), which is a process designed for multiagent coordination on the basis of reward values, state transition probabilities, and equilibrium strategy taking account of joint-action among agents. Using this model, multiagent Q-learning algorithm is then constructed as a cooperative reinforcement learning algorithm under completely connected agents. Finally, we prove that multiagent Q-learning values converge to optimal values. Simulation results are reported to illustrate the validity of the proposed multiagent Q-learning algorithm.
引用
收藏
页码:789 / 794
页数:6
相关论文
共 50 条
  • [31] Convergence analysis of cooperative Q-Learning using discrete-time Lyapunov approach
    School of Electrical Engineering and Informatics, Institut Teknologi Bandung, Jl. Ganesha 10, Bandung, Indonesia
    不详
    不详
    [J]. ICIC Express Lett., 12 (3153-3161):
  • [32] Q-Learning with probability based action policy
    Ugurlu, Ekin Su
    Biricik, Goksel
    [J]. 2006 IEEE 14TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1 AND 2, 2006, : 210 - +
  • [33] A Q-learning approach to attribute reduction
    Liu, Yuxin
    Gong, Zhice
    Liu, Keyu
    Xu, Suping
    Ju, Hengrong
    Yang, Xibei
    [J]. APPLIED INTELLIGENCE, 2023, 53 (04) : 3750 - 3765
  • [34] Recurrent Deep Multiagent Q-Learning for Autonomous Brokers in Smart Grid
    Yang, Yaodong
    Hao, Jianye
    Sun, Mingyang
    Wang, Zan
    Fan, Changjie
    Strbac, Goran
    [J]. PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 569 - 575
  • [35] A Hybrid Multiagent Framework With Q-Learning for Power Grid Systems Restoration
    Ye, Dayong
    Zhang, Minjie
    Sutanto, Danny
    [J]. IEEE TRANSACTIONS ON POWER SYSTEMS, 2011, 26 (04) : 2434 - 2441
  • [36] Role-based context-specific multiagent Q-learning
    Jiang, Da-Wei
    Wang, Shi-Yuan
    Dong, Yi-Sheng
    [J]. Zidonghua Xuebao/Acta Automatica Sinica, 2007, 33 (06): : 583 - 587
  • [37] Self-organisation in an Agent Network via Multiagent Q-Learning
    Ye, Dayong
    Zhang, Minjie
    Bai, Quan
    Ito, Takayuki
    [J]. KNOWLEDGE MANAGEMENT AND ACQUISITION FOR SMART SYSTEMS AND SERVICES, 2010, 6232 : 14 - +
  • [38] Constrained Q-Learning for Batch Process Optimization
    Pan, Elton
    Petsagkourakis, Panagiotis
    Mowbray, Max
    Zhang, Dongda
    del Rio-Chanona, Antonio
    [J]. IFAC PAPERSONLINE, 2021, 54 (03): : 492 - 497
  • [39] Q-learning Approach in the Context of Virtual Learning Environment
    Liviu, Ionita
    Irina, Tudor
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON VIRTUAL LEARNING, 2008, : 209 - 214
  • [40] Q-learning in Multi-Agent Cooperation
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Lin, Tzung-Feng
    [J]. 2008 IEEE WORKSHOP ON ADVANCED ROBOTICS AND ITS SOCIAL IMPACTS, 2008, : 239 - 244