Counterfactual-Based Action Evaluation Algorithm in Multi-Agent Reinforcement Learning

被引:4
|
作者
Yuan, Yuyu [1 ]
Zhao, Pengqian [1 ]
Guo, Ting [1 ]
Jiang, Hongpu [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Comp Sci, Key Lab Trustworthy Distributed Comp & Serv, Minist Educ,Natl Pilot Software Engn Sch, Beijing 100876, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 07期
关键词
multi-agent reinforcement learning; multi-agent system; counterfactual reasoning; intrinsic reward; social dilemmas; actor-critic;
D O I
10.3390/app12073439
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Multi-agent reinforcement learning (MARL) algorithms have made great achievements in various scenarios, but there are still many problems in solving sequential social dilemmas (SSDs). In SSDs, the agent's actions not only change the instantaneous state of the environment but also affect the latent state which will, in turn, affect all agents. However, most of the current reinforcement learning algorithms focus on analyzing the value of instantaneous environment state while ignoring the study of the latent state, which is very important for establishing cooperation. Therefore, we propose a novel counterfactual reasoning-based multi-agent reinforcement learning algorithm to evaluate the continuous contribution of agent actions on the latent state. We compute that using simulation reasoning and building an action evaluation network. Then through counterfactual reasoning, we can get a single agent's influence on the environment. Using this continuous contribution as an intrinsic reward enables the agent to consider the collective, thereby promoting cooperation. We conduct experiments in the SSDs environment, and the results show that the collective reward is increased by at least 25% which demonstrates the excellent performance of our proposed algorithm compared to the state-of-the-art algorithms.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Multi-Agent Reinforcement Learning Algorithm Based on Action Prediction
    童亮
    陆际联
    [J]. Journal of Beijing Institute of Technology, 2006, (02) : 133 - 137
  • [2] Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning
    Li, Jiahui
    Kuang, Kun
    Wang, Baoxiang
    Liu, Furui
    Chen, Long
    Wu, Fei
    Xiao, Jun
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 934 - 942
  • [3] Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning
    Shao, Jianzhun
    Qu, Yun
    Chen, Chen
    Zhang, Hongchang
    Ji, Xiangyang
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [4] Competitive Multi-Agent Deep Reinforcement Learning with Counterfactual Thinking
    Wang, Yue
    Wan, Yao
    Zhang, Chenwei
    Bai, Lu
    Cui, Lixin
    Yu, Philip S.
    [J]. 2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 1366 - 1371
  • [5] Cooperative Multi-Agent Deep Reinforcement Learning with Counterfactual Reward
    Shao, Kun
    Zhu, Yuanheng
    Tang, Zhentao
    Zhao, Dongbin
    [J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] A Multi-agent Reinforcement Learning Algorithm Based on Stackelberg Game
    Cheng, Chi
    Zhu, Zhangqing
    Xin, Bo
    Chen, Chunlin
    [J]. 2017 6TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS (DDCLS), 2017, : 727 - 732
  • [7] Traffic Distribution Algorithm Based on Multi-Agent Reinforcement Learning
    Cheng C.
    Teng J.-J.
    Zhao Y.-L.
    Song M.
    [J]. Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2019, 42 (06): : 43 - 48and57
  • [8] Multi-agent reinforcement learning algorithm based on neural networks
    Tang, Lianggui
    Yang, Hu
    An, Bo
    Cheng, Daijie
    [J]. DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 1569 - 1574
  • [9] Multi-agent Reinforcement Learning Algorithm Based on Local Information
    Li, Chonglun
    He, Zhaoxiong
    Wang, Bingzheng
    Wang, Zhen
    Li, Lingbin
    [J]. PROCEEDINGS OF 2022 INTERNATIONAL CONFERENCE ON AUTONOMOUS UNMANNED SYSTEMS, ICAUS 2022, 2023, 1010 : 3080 - 3091
  • [10] A counterfactual-based learning algorithm for ALC description logic
    Esposito, F
    Fanizzi, N
    Iannone, L
    Palmisano, I
    Semeraro, G
    [J]. AI*IA2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2005, 3673 : 406 - 417