Regularized Softmax Deep Multi-Agent Q-Learning

被引：0

作者：

Pan, Ling ^{[1
]}

Rashid, Tabish ^{[2
]}

Peng, Bei ^{[2
,3
]}

Huang, Longbo ^{[1
]}

Whiteson, Shimon ^{[2
]}

机构：

[1] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China

[2] Univ Oxford, Oxford, England

[3] Univ Liverpool, Liverpool, Merseyside, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

基金：

英国工程与自然科学研究理事会; 欧洲研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multiagent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multi-agent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.

引用

页数：13

共 50 条

[31] Extending Q-Learning to general adaptive multi-agent systems
Tesauro, G
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 16, 2004, 16 : 871 - 878
[32] CONTINUOUS ACTION GENERATION OF Q-LEARNING IN MULTI-AGENT COOPERATION
Hwang, Kao-Shing
Chen, Yu-Jen
Jiang, Wei-Cheng
Lin, Tzung-Feng
[J]. ASIAN JOURNAL OF CONTROL, 2013, 15 (04) : 1011 - 1020
[33] Minimax fuzzy Q-learning in cooperative multi-agent systems
Kilic, A
Arslan, A
[J]. ADVANCES IN INFORMATION SYSTEMS, 2002, 2457 : 264 - 272
[34] Continuous strategy replicator dynamics for multi-agent Q-learning
Aram Galstyan
[J]. Autonomous Agents and Multi-Agent Systems, 2013, 26 : 37 - 53
[35] A theoretical analysis of cooperative behaviorin multi-agent Q-learning
Waltman, Ludo
Kaymak, Uzay
[J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 84 - +
[36] Multi-Agent Q-Learning for Power Allocation in Interference Channel
Wongphatcharatham, Tanutsorn
Phakphisut, Watid
Wijitpornchai, Thongchai
Areeprayoonkij, Poonlarp
Jaruvitayakovit, Tanun
Hannanta-Anan, Pimkhuan
[J]. 2022 37TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2022), 2022, : 876 - 879
[37] Multi-User Mm Wave Beam Tracking via Multi-Agent Deep Q-Learning
MENG Fan
HUANG Yongming
LU Zhaohua
XIAO Huahua
[J]. ZTE Communications, 2023, 21 (02) : 53 - 60
[38] Multi-Agent Coordination Method Based on Fuzzy Q-Learning
Peng, Jun
Liu, Miao
Wu, Min
Zhang, Xiaoyong
Lin, Kuo-Chi
[J]. 2008 7TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-23, 2008, : 5411 - +
[39] A distributed Q-learning algorithm for multi-agent team coordination
Huang, J
Yang, B
Liu, DY
[J]. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 108 - 113
[40] Cooperative Multi-Agent Q-Learning Using Distributed MPC
Esfahani, Hossein Nejatbakhsh
Velni, Javad Mohammadpour
[J]. IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 2193 - 2198

← 1 2 3 4 5 →