Probabilistic Reward-Based Reinforcement Learning for Multi-Agent Pursuit and Evasion

被引:1
|
作者
Zhang, Bo-Kun [1 ]
Hu, Bin [1 ]
Chen, Long [1 ]
Zhang, Ding-Xue [2 ]
Cheng, Xin-Ming [3 ]
Guan, Zhi-Hong [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China
[2] Yangtze Univ, Sch Petr Engn, Jingzhou 434023, Peoples R China
[3] Cent South Univ, Sch Automat, Changsha 430083, Peoples R China
关键词
Reinforcement learning; Multi-agent; Pursuit-evasion; Probabilistic reward; SYSTEMS;
D O I
10.1109/CCDC52312.2021.9601771
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The reinforcement learning is studied to solve the problem of multi-agent pursuit and evasion games in this article. The main problem of current reinforcement learning for multi-agents is the low learning efficiency of agents. An important factor leading to this problem is that the delay of the Q function is related to the environment changing. To solve this problem, a probabilistic distribution reward value is used to replace the Q function in the multi-agent depth deterministic policy gradient framework (hereinafter referred to as MADDPG). The distribution Bellman equation is proved to be convergent, and can be brought into the framework of reinforcement learning algorithm. The probabilistic distribution reward value is updated in the algorithm, so that the reward value can be more adaptive to the complex environment. In the same time, eliminating the delay of rewards improves the efficiency of the strategy and obtains a better pursuit-evasion results. The final simulation and experiment show that the multi-agent algorithm with distribution rewards achieves better results under the setting environment.
引用
收藏
页码:3352 / 3357
页数:6
相关论文
共 50 条
  • [41] Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
    Duc Thien Nguyen
    Yeoh, William
    Lau, Hoong Chuin
    Zilberstein, Shlomo
    Zhang, Chongjie
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 1447 - 1455
  • [42] Multi-Agent Deep Reinforcement Learning With Progressive Negative Reward for Cryptocurrency Trading
    Kumlungmak, Kittiwin
    Vateekul, Peerapon
    [J]. IEEE ACCESS, 2023, 11 : 66440 - 66455
  • [43] Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
    Duc Thien Nguyen
    Yeoh, William
    Hoong Chuin Lau
    Zilberstein, Shlomo
    Zhang, Chongjie
    [J]. AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1341 - 1342
  • [44] Towards Designing Optimal Reward Functions in Multi-Agent Reinforcement Learning Problems
    Grunitzki, Ricardo
    da Silva, Bruno C.
    Bazzan, Ana L. C.
    [J]. 2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [45] Multi-agent cooperative learning research based on reinforcement learning
    Liu, Fei
    Zeng, Guangzhou
    [J]. 2006 10TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, PROCEEDINGS, VOLS 1 AND 2, 2006, : 1408 - 1413
  • [46] Multi-agent Cooperative Search based on Reinforcement Learning
    Sun, Yinjiang
    Zhang, Rui
    Liang, Wenbao
    Xu, Cheng
    [J]. PROCEEDINGS OF 2020 3RD INTERNATIONAL CONFERENCE ON UNMANNED SYSTEMS (ICUS), 2020, : 891 - 896
  • [47] Multi-agent reinforcement learning based on local communication
    Zhang, Wenxu
    Ma, Lei
    Li, Xiaonan
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (Suppl 6): : 15357 - 15366
  • [48] Function approximation based multi-agent reinforcement learning
    Abul, O
    Polat, F
    Alhajj, R
    [J]. 12TH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2000, : 36 - 39
  • [49] Multi-agent reinforcement learning based on local communication
    Wenxu Zhang
    Lei Ma
    Xiaonan Li
    [J]. Cluster Computing, 2019, 22 : 15357 - 15366
  • [50] Survey of Multi-Agent Strategy Based on Reinforcement Learning
    Chen, Liang
    Guo, Ting
    Liu, Yun-ting
    Yang, Jia-ming
    [J]. PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 604 - 609