Counterfactual-Based Action Evaluation Algorithm in Multi-Agent Reinforcement Learning

被引：4

作者：

Yuan, Yuyu ^{[1
]}

Zhao, Pengqian ^{[1
]}

Guo, Ting ^{[1
]}

Jiang, Hongpu ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Key Lab Trustworthy Distributed Comp & Serv, Minist Educ,Natl Pilot Software Engn Sch, Beijing 100876, Peoples R China

来源：

APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 07期

关键词：

multi-agent reinforcement learning; multi-agent system; counterfactual reasoning; intrinsic reward; social dilemmas; actor-critic;

D O I：

10.3390/app12073439

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Multi-agent reinforcement learning (MARL) algorithms have made great achievements in various scenarios, but there are still many problems in solving sequential social dilemmas (SSDs). In SSDs, the agent's actions not only change the instantaneous state of the environment but also affect the latent state which will, in turn, affect all agents. However, most of the current reinforcement learning algorithms focus on analyzing the value of instantaneous environment state while ignoring the study of the latent state, which is very important for establishing cooperation. Therefore, we propose a novel counterfactual reasoning-based multi-agent reinforcement learning algorithm to evaluate the continuous contribution of agent actions on the latent state. We compute that using simulation reasoning and building an action evaluation network. Then through counterfactual reasoning, we can get a single agent's influence on the environment. Using this continuous contribution as an intrinsic reward enables the agent to consider the collective, thereby promoting cooperation. We conduct experiments in the SSDs environment, and the results show that the collective reward is increased by at least 25% which demonstrates the excellent performance of our proposed algorithm compared to the state-of-the-art algorithms.

引用

页数：14

共 50 条

[1] Multi-Agent Reinforcement Learning Algorithm Based on Action Prediction
童亮
陆际联
[J]. Journal of Beijing Institute of Technology, 2006, (02) : 133 - 137
[2] Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning
Li, Jiahui
Kuang, Kun
Wang, Baoxiang
Liu, Furui
Chen, Long
Wu, Fei
Xiao, Jun
[J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 934 - 942
[3] Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
Shao, Jianzhun
Qu, Yun
Chen, Chen
Zhang, Hongchang
Ji, Xiangyang
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[4] Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking
Wang, Yue
Wan, Yao
Zhang, Chenwei
Bai, Lu
Cui, Lixin
Yu, Philip S.
[J]. 2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1366 - 1371
[5] Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward
Shao, Kun
Zhu, Yuanheng
Tang, Zhentao
Zhao, Dongbin
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[6] A Multi-agent Reinforcement Learning Algorithm Based on Stackelberg Game
Cheng, Chi
Zhu, Zhangqing
Xin, Bo
Chen, Chunlin
[J]. 2017 6TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS (DDCLS), 2017, : 727 - 732
[7] Traffic Distribution Algorithm Based on Multi-Agent Reinforcement Learning
Cheng C.
Teng J.-J.
Zhao Y.-L.
Song M.
[J]. Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2019, 42 (06): : 43 - 48and57
[8] Multi-agent reinforcement learning algorithm based on neural networks
Tang, Lianggui
Yang, Hu
An, Bo
Cheng, Daijie
[J]. DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 1569 - 1574
[9] Multi-agent Reinforcement Learning Algorithm Based on Local Information
Li, Chonglun
He, Zhaoxiong
Wang, Bingzheng
Wang, Zhen
Li, Lingbin
[J]. PROCEEDINGS OF 2022 INTERNATIONAL CONFERENCE ON AUTONOMOUS UNMANNED SYSTEMS, ICAUS 2022, 2023, 1010 : 3080 - 3091
[10] A counterfactual-based learning algorithm for ALC description logic
Esposito, F
Fanizzi, N
Iannone, L
Palmisano, I
Semeraro, G
[J]. AI*IA2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3673 : 406 - 417

← 1 2 3 4 5 →