A Multi-Agent Reinforcement Learning Method Based on Self-Attention Mechanism and Policy Mapping Recombination

被引：0

作者：

Li J.-C. ^{[1
]}

Shi H.-B. ^{[1
]}

Hwang K.-S. ^{[1
,2
]}

机构：

[1] School of Computer Science, Northwestern Polytechnical University, Xi'an

[2] Department of Electrical Engineering, Kaohsiung Sun Yat-sen University, Kaohsiung

来源：

Jisuanji Xuebao/Chinese Journal of Computers | 2022年 / 45卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Attention mechanism; Deep reinforcement learning; Multi-Agent reinforcement learning; Multi-Agent system;

D O I：

10.11897/SP.J.1016.2022.01842

中图分类号：

学科分类号：

摘要：

Multi-Agent Reinforcement Learning(MARL) has been widely applied in the group control field. Due to the Markov decision process for an agent is broken in MARL, the existing MARL methods are hard to learn optimal policies, and policies are unstable because the random behaviors of agents in MARL. From the viewpoint of the mapping between state spaces and behavior spaces, this work studies the coupling among agents in homogeneous MARL, aiming at enhancing the policy effectiveness and training stability. We first investigate the recombination of the joint behavior space for homogeneous agents, breaking the one-to-one correspondence between agents and policies. Then the abstract agents are proposed to transform the coupling among agents into that among the actions in the behavior space, by which the training efficiency and stabilization are improved. Based on the former, inspiring by sequential decisions, we design self-attention modules for the abstract agents' policy networks and evaluation networks respectively, encoding and thinning the states of agents. The learned policies can be explicitly explained through the self-attention module and the recombination. The proposed method is validated in three simulated MARL scenarios. The experimental results suggest that our method can outperform baseline methods in the case of centralized rewards, while the stability can be increased more than fifty percent by our method. Some ablation experiments are designed to validate the abstract agents and self-attention modules respectively, making our conclusion more convincing. © 2022, Science Press. All right reserved.

引用

页码：1842 / 1858

页数：16

共 32 条

[1] Xu Jin, Liu Quan, Zhang Zong-Zhang, Et al., Asynchronous deep reinforcement learning with multiple gating mechanisms, Chinese Journal of Computers, 42, 3, pp. 636-653, (2019)
[2] Rocha F M, Costa V S, Reis L P., From reinforcement learning towards artificial general intelligence, Proceedings of the 2020 World Conference on Information Systems and Technologies, pp. 401-413, (2020)
[3] Chai Lai, Zhang Ting-Ting, Dong Hui, Et al., Multi-agent deep reinforcement learning algorithm based on partitioned buffer replay and multiple process interaction, Chinese Journal of Computers, 44, 6, pp. 1140-1152, (2021)
[4] Cui J, Liu Y, Nallanathan A., Multi-agent reinforcement learning-based resource allocation for UAV networks, IEEE Transactions on Wireless Communications, 19, 2, pp. 729-743, (2019)
[5] Catacora Ocana J M, Riccio F, Capobianco R, Et al., Cooperative multi-agent deep reinforcement learning in soccer domains, Proceedings of the 18th International Conference on Autonomous Agents and Multi Agent Systems, pp. 1865-1867, (2019)
[6] Liu X, Yu J, Feng Z, Et al., Multi-agent reinforcement learning for resource allocation in IoT networks with edge computing, China Communications, 17, 9, pp. 220-236, (2020)
[7] Posor J E, Belzner L, Knapp A., Joint action learning for multi-agent cooperation using recurrent reinforcement learning, Digitale Welt, 4, 1, pp. 79-84, (2020)
[8] Schollig A, Alonso-Mora J, D'Andrea R., Independent vs. joint estimation in multi-agent iterative learning control, Proceedings of the 49th IEEE Conference on Decision and Control(CDC), pp. 6949-6954, (2010)
[9] Lowe R, Wu Y, Tamar A, Et al., Multi-agent actor-critic for mixed cooperative-competitive environments, Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6382-6393, (2017)
[10] Rashid T, Samvelyan M, Schroeder C, Et al., QMIX: Monotonic value function factorisation for deep multi-agent reinforcement learning, Proceedings of the 2018 International Conference on Machine Learning, pp. 4295-4304, (2018)

← 1 2 3 4 →