A Distributional Perspective on Multiagent Cooperation With Deep Reinforcement Learning

被引：7

作者：

Huang, Liwei ^{[1
,2
]}

Fu, Mingsheng ^{[1
]}

Rao, Ananya ^{[3
]}

Irissappane, Athirai A. ^{[3
]}

Zhang, Jie ^{[4
]}

Xu, Chengzhong ^{[2
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 610054, Peoples R China

[2] Univ Macau, State Key Lab IoTSC, Taipa 999078, Macao, Peoples R China

[3] Univ Washington, Sch Engn & Technol, Tacoma, WA 98402 USA

[4] Nanyang Technol Univ, Sch Comp Sci & Engn, Singapore, Singapore

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 03期

基金：

中国博士后科学基金;

关键词：

Deep reinforcement learning (RL); distributional RL; multiagent system; neural network; LEVEL;

D O I：

10.1109/TNNLS.2022.3202097

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Among various value decomposition-based multiagent reinforcement learning (MARL) algorithms, the overall performance of the multiagent system is represented by a scalar global Q value and optimized by minimizing the temporal difference (TD) error with respect to that global Q value. However, the global Q value cannot accurately model the distributed dynamics of the multiagent system, since it is only a simplified representation for different individual Q values of agents. To explicitly consider the correlations between different cooperative agents, in this article, we propose a distributional framework and construct a practical model called distributional multiagent cooperation (DMAC) from a novel distributional perspective. Specifically, in DMAC, we view the individual Q value for the executed action of a random agent as a value distribution, whose expectation can further represent the overall performance. Then, we employ distributional RL to minimize the difference between the estimated distribution and its target for the optimization. The advantage of DMAC is that the distributed dynamics of agents can be explicitly modeled, and this results in better performance. To verify the effectiveness of DMAC, we conduct extensive experiments under nine different scenarios of the StarCraft Multiagent Challenge (SMAC). Experimental results show that the DMAC can significantly outperform the baselines with respect to the average median test win rate.

引用

页码：4246 / 4259

页数：14

共 50 条

[1] Multiagent cooperation and competition with deep reinforcement learning
Tampuu, Ardi
Matiisen, Tambet
Kodelja, Dorian
Kuzovkin, Ilya
Korjus, Kristjan
Aru, Juhan
Aru, Jaan
Vicente, Raul
PLOS ONE, 2017, 12 (04):
[2] A Distributional Perspective on Reinforcement Learning
Bellemare, Marc G.
Dabney, Will
Munos, Remi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[3] Measurement of Underlying Cooperation in Multiagent Reinforcement Learning
Arai, Sachiyo
Ishigaki, Yoshihisa
Hirata, Hironori
INTELLIGENT AGENTS AND MULTI-AGENT SYSTEMS, PROCEEDINGS, 2008, 5357 : 34 - 41
[4] Reinforcement learning for encouraging cooperation in a multiagent system
Jiang, Wei-Cheng
Huang, Hong-Hao
Wang, Yu-Teng
INFORMATION SCIENCES, 2024, 680
[5] A fully value distributional deep reinforcement learning framework for multi-agent cooperation
Fu, Mingsheng
Huang, Liwei
Li, Fan
Qu, Hong
Xu, Chengzhong
NEURAL NETWORKS, 2025, 184
[6] A survey and critique of multiagent deep reinforcement learning
Pablo Hernandez-Leal
Bilal Kartal
Matthew E. Taylor
Autonomous Agents and Multi-Agent Systems, 2019, 33 : 750 - 797
[7] Deep multiagent reinforcement learning: challenges and directions
Annie Wong
Thomas Bäck
Anna V. Kononova
Aske Plaat
Artificial Intelligence Review, 2023, 56 : 5023 - 5056
[8] Distributional Deep Reinforcement Learning with a Mixture of Gaussians
Choi, Yunho
Lee, Kyungjae
Oh, Songhwai
2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 9791 - 9797
[9] Deep multiagent reinforcement learning: challenges and directions
Wong, Annie
Back, Thomas
Kononova, Anna, V
Plaat, Aske
ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (06) : 5023 - 5056
[10] A survey and critique of multiagent deep reinforcement learning
Hernandez-Leal, Pablo
Kartal, Bilal
Taylor, Matthew E.
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2019, 33 (06) : 750 - 797

← 1 2 3 4 5 →