Regularized Softmax Deep Multi-Agent Q-Learning

被引:0
|
作者
Pan, Ling [1 ]
Rashid, Tabish [2 ]
Peng, Bei [2 ,3 ]
Huang, Longbo [1 ]
Whiteson, Shimon [2 ]
机构
[1] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China
[2] Univ Oxford, Oxford, England
[3] Univ Liverpool, Liverpool, Merseyside, England
基金
英国工程与自然科学研究理事会; 欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multiagent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Multi-Agent Reward-Iteration Fuzzy Q-Learning
    Lixiong Leng
    Jingchen Li
    Jinhui Zhu
    Kao-Shing Hwang
    Haobin Shi
    [J]. International Journal of Fuzzy Systems, 2021, 23 : 1669 - 1679
  • [42] DVF:Multi-agent Q-learning with difference value factorization
    Huang, Anqi
    Wang, Yongli
    Sang, Jianghui
    Wang, Xiaoli
    Wang, Yupeng
    [J]. KNOWLEDGE-BASED SYSTEMS, 2024, 286
  • [43] Synchronous n-Step Method for Independent Q-Learning in Multi-Agent Deep Reinforcement Learning
    Gong, Xudong
    Ding, Bo
    Xu, Jie
    Wang, Huaimin
    Zhou, Xing
    Jia, Hongda
    [J]. 2019 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI 2019), 2019, : 460 - 467
  • [44] Distributed Multi-Agent Deep Q-Learning for Load Balancing User Association in Dense Networks
    Lim, Byungju
    Vu, Mai
    [J]. IEEE WIRELESS COMMUNICATIONS LETTERS, 2023, 12 (07) : 1120 - 1124
  • [45] Fault diagnosis of planetary gearbox based on multi-Agent deep Q-learning and fuzzy integral
    Chen, Renxiang
    Zhou, Jun
    Hu, Xiaolin
    Zeng, Li
    Chen, Cai
    Hu, Chaochao
    [J]. Zhendong yu Chongji/Journal of Vibration and Shock, 2021, 40 (11): : 147 - 153
  • [46] Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization
    Wang, Jianhao
    Ren, Zhizhou
    Han, Beining
    Ye, Jianing
    Zhang, Chongjie
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [47] A Multi-Agent Q-Learning Based Rendezvous Strategy for Cognitive Radios
    Watson, Clifton L.
    Chakravarthy, Vasu D.
    Biswas, Subir
    [J]. 2017 COGNITIVE COMMUNICATIONS FOR AEROSPACE APPLICATIONS WORKSHOP (CCAA), 2017,
  • [48] Multi-agent Q-learning and regression trees for automated pricing decisions
    Sridharan, M
    Tesauro, G
    [J]. FOURTH INTERNATIONAL CONFERENCE ON MULTIAGENT SYSTEMS, PROCEEDINGS, 2000, : 447 - 448
  • [49] Modular Q-learning based multi-agent cooperation for robot soccer
    Park, KH
    Kim, YJ
    Kim, JH
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2001, 35 (02) : 109 - 122
  • [50] Fuzzy Q-Learning for multi-agent decentralized energy management in microgrids
    Kofinas, P.
    Dounis, A., I
    Vouros, G. A.
    [J]. APPLIED ENERGY, 2018, 219 : 53 - 67