Regularized Softmax Deep Multi-Agent Q-Learning

被引:0
|
作者
Pan, Ling [1 ]
Rashid, Tabish [2 ]
Peng, Bei [2 ,3 ]
Huang, Longbo [1 ]
Whiteson, Shimon [2 ]
机构
[1] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China
[2] Univ Oxford, Oxford, England
[3] Univ Liverpool, Liverpool, Merseyside, England
基金
英国工程与自然科学研究理事会; 欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multiagent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Deep Q-Learning and Preference Based Multi-Agent System for Sustainable Agricultural Market
    Perez-Pons, Maria E.
    Alonso, Ricardo S.
    Garcia, Oscar
    Marreiros, Goreti
    Corchado, Juan Manuel
    [J]. SENSORS, 2021, 21 (16)
  • [22] Heterogeneous Team Deep Q-Learning in Low-Dimensional Multi-Agent Environments
    Kurek, Mateusz
    Jaskowski, Wojciech
    [J]. 2016 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND GAMES (CIG), 2016,
  • [23] Cooperative behavior acquisition for multi-agent systems by Q-learning
    Xie, M. C.
    Tachibana, A.
    [J]. 2007 IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTATIONAL INTELLIGENCE, VOLS 1 AND 2, 2007, : 424 - +
  • [24] The acquisition of sociality by using Q-learning in a multi-agent environment
    Nagayuki, Yasuo
    [J]. PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11), 2011, : 820 - 823
  • [25] Multi-Agent Reward-Iteration Fuzzy Q-Learning
    Leng, Lixiong
    Li, Jingchen
    Zhu, Jinhui
    Hwang, Kao-Shing
    Shi, Haobin
    [J]. INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2021, 23 (06) : 1669 - 1679
  • [26] Multi-agent Q-learning Based Navigation in an Unknown Environment
    Nath, Amar
    Niyogi, Rajdeep
    Singh, Tajinder
    Kumar, Virendra
    [J]. ADVANCED INFORMATION NETWORKING AND APPLICATIONS, AINA-2022, VOL 1, 2022, 449 : 330 - 340
  • [27] Q-Learning with Side Information in Multi-Agent Finite Games
    Sylvestre, Mathieu
    Pavel, Lacra
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5032 - 5037
  • [28] Continuous strategy replicator dynamics for multi-agent Q-learning
    Galstyan, Aram
    [J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2013, 26 (01) : 37 - 53
  • [29] Real-Valued Q-learning in Multi-agent Cooperation
    Hwang, Kao-Shing
    Lo, Chia-Yue
    Chen, Kim-Joan
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 395 - 400
  • [30] Multi-Agent Q-Learning with Joint State Value Approximation
    Chen Gang
    Cao Weihua
    Chen Xin
    Wu Min
    [J]. 2011 30TH CHINESE CONTROL CONFERENCE (CCC), 2011, : 4878 - 4882