Regularized Softmax Deep Multi-Agent Q-Learning

被引:0
|
作者
Pan, Ling [1 ]
Rashid, Tabish [2 ]
Peng, Bei [2 ,3 ]
Huang, Longbo [1 ]
Whiteson, Shimon [2 ]
机构
[1] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China
[2] Univ Oxford, Oxford, England
[3] Univ Liverpool, Liverpool, Merseyside, England
基金
英国工程与自然科学研究理事会; 欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multiagent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Extending Q-Learning to general adaptive multi-agent systems
    Tesauro, G
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 871 - 878
  • [32] CONTINUOUS ACTION GENERATION OF Q-LEARNING IN MULTI-AGENT COOPERATION
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Jiang, Wei-Cheng
    Lin, Tzung-Feng
    [J]. ASIAN JOURNAL OF CONTROL, 2013, 15 (04) : 1011 - 1020
  • [33] Minimax fuzzy Q-learning in cooperative multi-agent systems
    Kilic, A
    Arslan, A
    [J]. ADVANCES IN INFORMATION SYSTEMS, 2002, 2457 : 264 - 272
  • [34] Continuous strategy replicator dynamics for multi-agent Q-learning
    Aram Galstyan
    [J]. Autonomous Agents and Multi-Agent Systems, 2013, 26 : 37 - 53
  • [35] A theoretical analysis of cooperative behaviorin multi-agent Q-learning
    Waltman, Ludo
    Kaymak, Uzay
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 84 - +
  • [36] Multi-Agent Q-Learning for Power Allocation in Interference Channel
    Wongphatcharatham, Tanutsorn
    Phakphisut, Watid
    Wijitpornchai, Thongchai
    Areeprayoonkij, Poonlarp
    Jaruvitayakovit, Tanun
    Hannanta-Anan, Pimkhuan
    [J]. 2022 37TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2022), 2022, : 876 - 879
  • [37] Multi-User Mm Wave Beam Tracking via Multi-Agent Deep Q-Learning
    MENG Fan
    HUANG Yongming
    LU Zhaohua
    XIAO Huahua
    [J]. ZTE Communications, 2023, 21 (02) : 53 - 60
  • [38] Multi-Agent Coordination Method Based on Fuzzy Q-Learning
    Peng, Jun
    Liu, Miao
    Wu, Min
    Zhang, Xiaoyong
    Lin, Kuo-Chi
    [J]. 2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 5411 - +
  • [39] A distributed Q-learning algorithm for multi-agent team coordination
    Huang, J
    Yang, B
    Liu, DY
    [J]. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 108 - 113
  • [40] Cooperative Multi-Agent Q-Learning Using Distributed MPC
    Esfahani, Hossein Nejatbakhsh
    Velni, Javad Mohammadpour
    [J]. IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 2193 - 2198