Mutual-Information Regularized Multi-Agent Policy Iteration

被引:0
|
作者
Wang, Jiangxing [1 ]
Ye, Deheng [2 ]
Lu, Zongqing [1 ,3 ]
机构
[1] Peking Univ, Sch Comp Sci, Beijing, Peoples R China
[2] Tencent Inc, Shenzhen, Peoples R China
[3] BAAI, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the success of cooperative multi-agent reinforcement learning algorithms, most of them focus on a single team composition, which prevents them from being used in more realistic scenarios where dynamic team composition is possible. While some studies attempt to solve this problem via multi-task learning in a fixed set of team compositions, there is still a risk of overfitting to the training set, which may lead to catastrophic performance when facing dramatically varying team compositions during execution. To address this problem, we propose to use mutual information (MI) as an augmented reward to prevent individual policies from relying too much on team-related information and encourage agents to learn policies that are robust in different team compositions. Optimizing this MI-augmented objective in an off-policy manner can be intractable due to the existence of dynamic marginal distribution. To alleviate this problem, we first propose a multi-agent policy iteration algorithm with a fixed marginal distribution and prove its convergence and optimality. Then, we propose to employ the Blahut-Arimoto algorithm and an imaginary team composition distribution for optimization with approximate marginal distribution as the practical implementation. Empirically, our method demonstrates strong zero-shot generalization to dynamic team compositions in complex cooperative tasks.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Mutual Information and Multi-Agent Systems
    Moskowitz, Ira S.
    Rogers, Pi
    Russell, Stephen
    [J]. ENTROPY, 2022, 24 (12)
  • [2] Multi-Agent Least-Squares Policy Iteration
    Palmer, Victor
    [J]. ECAI 2006, PROCEEDINGS, 2006, 141 : 733 - 734
  • [3] Distributed Policy Iteration for Scalable Approximation of Cooperative Multi-Agent Policies
    Phan, Thomy
    Schmid, Kyrill
    Belzner, Lenz
    Gabor, Thomas
    Feld, Sebastian
    Linnhoff-Popien, Claudia
    [J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 2162 - 2164
  • [4] Cluster consensus in multi-agent networks with mutual information exchange
    Erkan F.
    Akar M.
    [J]. AI & SOCIETY, 2018, 33 (2) : 197 - 205
  • [5] Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
    Zhang Y.
    Qu G.
    Xu P.
    Lin Y.
    Chen Z.
    Wierman A.
    [J]. Performance Evaluation Review, 2023, 51 (01): : 83 - 84
  • [6] Distributed entropy-regularized multi-agent reinforcement learning with policy consensus
    Hu, Yifan
    Fu, Junjie
    Wen, Guanghui
    Lv, Yuezu
    Ren, Wei
    [J]. AUTOMATICA, 2024, 164
  • [7] Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
    Zhang, Yizhou
    Qu, Guannan
    Xu, Pan
    Lin, Yiheng
    Chen, Zaiwei
    Wierman, Adam
    [J]. PROCEEDINGS OF THE ACM ON MEASUREMENT AND ANALYSIS OF COMPUTING SYSTEMS, 2023, 7 (01)
  • [8] Approximated multi-agent fitted Q iteration
    Lesage-Landry, Antoine
    Callaway, Duncan S.
    [J]. SYSTEMS & CONTROL LETTERS, 2023, 177
  • [9] Joint Policy Search for Multi-agent Collaboration with Imperfect Information
    Tian, Yuandong
    Gong, Qucheng
    Jiang, Tina
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [10] Continuous sampling in mutual-information registration
    Seppa, Mika
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2008, 17 (05) : 823 - 826