Regularized Softmax Deep Multi-Agent Q-Learning

被引:0
|
作者
Pan, Ling [1 ]
Rashid, Tabish [2 ]
Peng, Bei [2 ,3 ]
Huang, Longbo [1 ]
Whiteson, Shimon [2 ]
机构
[1] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China
[2] Univ Oxford, Oxford, England
[3] Univ Liverpool, Liverpool, Merseyside, England
基金
英国工程与自然科学研究理事会; 欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multiagent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Modular Production Control with Multi-Agent Deep Q-Learning
    Gankin, Dennis
    Mayer, Sebastian
    Zinn, Jonas
    Vogel-Heuser, Birgit
    Endisch, Christian
    [J]. 2021 26TH IEEE INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION (ETFA), 2021,
  • [2] Q-learning in Multi-Agent Cooperation
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Lin, Tzung-Feng
    [J]. 2008 IEEE WORKSHOP ON ADVANCED ROBOTICS AND ITS SOCIAL IMPACTS, 2008, : 239 - 244
  • [3] Multi-Agent Advisor Q-Learning
    Subramanian, Sriram Ganapathi
    Taylor, Matthew E.
    Larson, Kate
    Crowley, Mark
    [J]. Journal of Artificial Intelligence Research, 2022, 74 : 1 - 74
  • [4] Multi-Agent Advisor Q-Learning
    Subramanian, Sriram Ganapathi
    Taylor, Matthew E.
    Larson, Kate
    Crowley, Mark
    [J]. PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 6884 - 6889
  • [5] Multi-Agent Advisor Q-Learning
    Subramanian, Sriram Ganapathi
    Taylor, Matthew E.
    Larson, Kate
    Crowley, Mark
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 74 : 1 - 74
  • [6] Deep Q-Learning for Decentralized Multi-Agent Inspection of a Tumbling Target
    Aurand, Joshua
    Cutlip, Steven
    Lei, Henry
    Lang, Kendra
    Phillips, Sean
    [J]. JOURNAL OF SPACECRAFT AND ROCKETS, 2024, 61 (02) : 341 - 354
  • [7] Untangling Braids with Multi-Agent Q-Learning
    Khan, Abdullah
    Vernitski, Alexei
    Lisitsa, Alexei
    [J]. 2021 23RD INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2021), 2021, : 135 - 139
  • [8] Continuous Q-Learning for Multi-Agent Cooperation
    Hwang, Kao-Shing
    Jiang, Wei-Cheng
    Lin, Yu-Hong
    Lai, Li-Hsin
    [J]. CYBERNETICS AND SYSTEMS, 2012, 43 (03) : 227 - 256
  • [9] Q-learning with FCMAC in multi-agent cooperation
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Lin, Tzung-Feng
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2006, PT 1, 2006, 3971 : 599 - 606
  • [10] A novel multi-agent Q-learning algorithm in cooperative multi-agent system
    Ou, HT
    Zhang, WD
    Zhang, WY
    Xu, XM
    [J]. PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, 2000, : 272 - 276