Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

被引：13

作者：

Li, Jiahui ^{[1
]}

Kuang, Kun ^{[1
]}

Wang, Baoxiang ^{[2
,3
]}

Liu, Furui ^{[4
]}

Chen, Long ^{[1
,5
]}

Wu, Fei ^{[1
]}

Xiao, Jun ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci, DCD Lab, Hangzhou, Peoples R China

[2] Chinese Univ Hong Kong, Shenzhen, Peoples R China

[3] Shenzhen Inst Artificial Intelligence & Robot Soc, Shenzhen, Peoples R China

[4] Huawei Noahs Ark Lab, Shanghai, Peoples R China

[5] Zhejiang Univ, Hangzhou, Peoples R China

来源：

KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING | 2021年

基金：

中国国家自然科学基金; 浙江省自然科学基金;

关键词：

Shapley Value; Counterfactual Thinking; Multi-Agent Systems; Reinforcement Learning; Credit Assignment; COORDINATION;

D O I：

10.1145/3447548.3467420

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Centralized Training with Decentralized Execution (CTDE) has been a popular paradigm in cooperative Multi-Agent Reinforcement Learning (MARL) settings and is widely used in many real applications. One of the major challenges in the training process is credit assignment, which aims to deduce the contributions of each agent according to the global rewards. Existing credit assignment methods focus on either decomposing the joint value function into individual value functions or measuring the impact of local observations and actions on the global value function. These approaches lack a thorough consideration of the complicated interactions among multiple agents, leading to an unsuitable assignment of credit and subsequently mediocre results on MARL. We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents. Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent. Despite this capability, the main technical difficulty lies in the computational complexity of Shapley Value who grows factorially as the number of agents. We instead utilize an approximation method via Monte Carlo sampling, which reduces the sample complexity while maintaining its effectiveness. We evaluate our method on StarCraft II benchmarks across different scenarios. Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.

引用

页码：934 / 942

页数：9

共 50 条

[1] Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
Shao, Jianzhun
Qu, Yun
Chen, Chen
Zhang, Hongchang
Ji, Xiangyang
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[2] Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking
Wang, Yue
Wan, Yao
Zhang, Chenwei
Bai, Lu
Cui, Lixin
Yu, Philip S.
[J]. 2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1366 - 1371
[3] Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward
Shao, Kun
Zhu, Yuanheng
Tang, Zhentao
Zhao, Dongbin
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[4] Counterfactual-Based Action Evaluation Algorithm in Multi-Agent Reinforcement Learning
Yuan, Yuyu
Zhao, Pengqian
Guo, Ting
Jiang, Hongpu
[J]. APPLIED SCIENCES-BASEL, 2022, 12 (07):
[5] PAC: Assisted Value Factorisation with Counterfactual Predictions in Multi-Agent Reinforcement Learning
Zhou, Hanhan
Lan, Tian
Aggarwal, Vaneet
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[6] Multi-Agent Reinforcement Learning
Stankovic, Milos
[J]. 2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43
[7] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation
Wang, Huimu
Qiu, Tenghai
Liu, Zhen
Pu, Zhiqiang
Yi, Jianqiang
Yuan, Wanmai
[J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[8] Multi-Agent Uncertainty Sharing for Cooperative Multi-Agent Reinforcement Learning
Chen, Hao
Yang, Guangkai
Zhang, Junge
Yin, Qiyue
Huang, Kaiqi
[J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[9] Hierarchical multi-agent reinforcement learning
Mohammad Ghavamzadeh
Sridhar Mahadevan
Rajbala Makar
[J]. Autonomous Agents and Multi-Agent Systems, 2006, 13 : 197 - 229
[10] Multi-Agent Reinforcement Learning With Distributed Targeted Multi-Agent Communication
Xu, Chi
Zhang, Hui
Zhang, Ya
[J]. 2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 2915 - 2920

← 1 2 3 4 5 →