A Multi-Agent Reinforcement Learning Method Based on Self-Attention Mechanism and Policy Mapping Recombination

被引：0

作者：

Li J.-C. ^{[1
]}

Shi H.-B. ^{[1
]}

Hwang K.-S. ^{[1
,2
]}

机构：

[1] School of Computer Science, Northwestern Polytechnical University, Xi'an

[2] Department of Electrical Engineering, Kaohsiung Sun Yat-sen University, Kaohsiung

来源：

Jisuanji Xuebao/Chinese Journal of Computers | 2022年 / 45卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Attention mechanism; Deep reinforcement learning; Multi-Agent reinforcement learning; Multi-Agent system;

D O I：

10.11897/SP.J.1016.2022.01842

中图分类号：

学科分类号：

摘要：

Multi-Agent Reinforcement Learning(MARL) has been widely applied in the group control field. Due to the Markov decision process for an agent is broken in MARL, the existing MARL methods are hard to learn optimal policies, and policies are unstable because the random behaviors of agents in MARL. From the viewpoint of the mapping between state spaces and behavior spaces, this work studies the coupling among agents in homogeneous MARL, aiming at enhancing the policy effectiveness and training stability. We first investigate the recombination of the joint behavior space for homogeneous agents, breaking the one-to-one correspondence between agents and policies. Then the abstract agents are proposed to transform the coupling among agents into that among the actions in the behavior space, by which the training efficiency and stabilization are improved. Based on the former, inspiring by sequential decisions, we design self-attention modules for the abstract agents' policy networks and evaluation networks respectively, encoding and thinning the states of agents. The learned policies can be explicitly explained through the self-attention module and the recombination. The proposed method is validated in three simulated MARL scenarios. The experimental results suggest that our method can outperform baseline methods in the case of centralized rewards, while the stability can be increased more than fifty percent by our method. Some ablation experiments are designed to validate the abstract agents and self-attention modules respectively, making our conclusion more convincing. © 2022, Science Press. All right reserved.

引用

页码：1842 / 1858

页数：16

共 32 条

[21] Zhang K, Yang Z, Basar T., Multi-agent reinforcement learning: A selective overview of theories and algorithms, Handbook of Reinforcement Learning and Control, pp. 321-384, (2021)
[22] Kapoor S., Multi-agent reinforcement learning: A report on challenges and approaches, (2018)
[23] Nadkarni P M, Ohno-Machado L, Chapman W W., Natural language processing: An introduction, Journal of the American Medical Informatics Association, 18, 5, pp. 544-551, (2011)
[24] Vaswani A, Shazeer N, Parmar N, Et al., Attention is all you need, Proceedings of the 31st Advances in Neural Information Processing Systems, pp. 5998-6008, (2017)
[25] Yang H, Kim J Y, Kim H, Et al., Guided soft attention network for classification of breast cancer histopathology images, IEEE Transactions on Medical Imaging, 39, 5, pp. 1306-1315, (2019)
[26] Liang Xing-Xing, Feng Yang-He, Ma Yang, Et al., Deep multi-agent reinforcement learning:A survey, Acta Automatica Sinica, 46, 12, pp. 2537-2557, (2020)
[27] Iqbal S, Sha F., Actor-attention-critic for multi-agent reinforcement learning, Proceedings of the 36th International Conference on Machine Learning, pp. 2961-2970, (2019)
[28] Shi H, Li J, Mao J, Et al., Lateral transfer learning for multiagent reinforcement learning, IEEE Transactions on Cybernetics, (2021)
[29] Hessel M, Soyer H, Espeholt L, Et al., Multi-task deep reinforcement learning with PopArt, Proceedings of the 2019 AAAI Conference on Artificial Intelligence, 33, 1, pp. 3796-3803, (2019)
[30] Liu G, Li X, Sun M, Et al., An advantage actor-critic algorithm with confidence exploration for open information extraction, Proceedings of the 2020 SIAM International Conference on Data Mining, pp. 217-225, (2020)

← 1 2 3 4 →