Mutual-Information Regularized Multi-Agent Policy Iteration

被引：0

作者：

Wang, Jiangxing ^{[1
]}

Ye, Deheng ^{[2
]}

Lu, Zongqing ^{[1
,3
]}

机构：

[1] Peking Univ, Sch Comp Sci, Beijing, Peoples R China

[2] Tencent Inc, Shenzhen, Peoples R China

[3] BAAI, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the success of cooperative multi-agent reinforcement learning algorithms, most of them focus on a single team composition, which prevents them from being used in more realistic scenarios where dynamic team composition is possible. While some studies attempt to solve this problem via multi-task learning in a fixed set of team compositions, there is still a risk of overfitting to the training set, which may lead to catastrophic performance when facing dramatically varying team compositions during execution. To address this problem, we propose to use mutual information (MI) as an augmented reward to prevent individual policies from relying too much on team-related information and encourage agents to learn policies that are robust in different team compositions. Optimizing this MI-augmented objective in an off-policy manner can be intractable due to the existence of dynamic marginal distribution. To alleviate this problem, we first propose a multi-agent policy iteration algorithm with a fixed marginal distribution and prove its convergence and optimality. Then, we propose to employ the Blahut-Arimoto algorithm and an imaginary team composition distribution for optimization with approximate marginal distribution as the practical implementation. Empirically, our method demonstrates strong zero-shot generalization to dynamic team compositions in complex cooperative tasks.

引用

页数：19

共 50 条

[1] Mutual Information and Multi-Agent Systems
Moskowitz, Ira S.
Rogers, Pi
Russell, Stephen
[J]. ENTROPY, 2022, 24 (12)
[2] Multi-Agent Least-Squares Policy Iteration
Palmer, Victor
[J]. ECAI 2006, PROCEEDINGS, 2006, 141 : 733 - 734
[3] Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies
Phan, Thomy
Schmid, Kyrill
Belzner, Lenz
Gabor, Thomas
Feld, Sebastian
Linnhoff-Popien, Claudia
[J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2162 - 2164
[4] Cluster consensus in multi-agent networks with mutual information exchange
Erkan F.
Akar M.
[J]. AI & SOCIETY, 2018, 33 (2) : 197 - 205
[5] Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
Zhang Y.
Qu G.
Xu P.
Lin Y.
Chen Z.
Wierman A.
[J]. Performance Evaluation Review, 2023, 51 (01): : 83 - 84
[6] Distributed entropy-regularized multi-agent reinforcement learning with policy consensus
Hu, Yifan
Fu, Junjie
Wen, Guanghui
Lv, Yuezu
Ren, Wei
[J]. AUTOMATICA, 2024, 164
[7] Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
Zhang, Yizhou
Qu, Guannan
Xu, Pan
Lin, Yiheng
Chen, Zaiwei
Wierman, Adam
[J]. PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2023, 7 (01)
[8] Approximated multi-agent fitted Q iteration
Lesage-Landry, Antoine
Callaway, Duncan S.
[J]. SYSTEMS & CONTROL LETTERS, 2023, 177
[9] Joint Policy Search for Multi-agent Collaboration with Imperfect Information
Tian, Yuandong
Gong, Qucheng
Jiang, Tina
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[10] Continuous sampling in mutual-information registration
Seppa, Mika
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2008, 17 (05) : 823 - 826

← 1 2 3 4 5 →