Actor-Critic for Multi-Agent Reinforcement Learning with Self-Attention

被引:2
|
作者
Zhao, Juan [1 ]
Zhu, Tong [2 ]
Xiao, Shuo [3 ]
Gao, Zongqian [3 ]
Sun, Hao [3 ]
机构
[1] Henan Ind & Trade Vocat Coll, Dept Mech & Elect Engn, Zhengzhou, Peoples R China
[2] Zhengzhou Coal Ind Grp Co Ltd, Zhengzhou, Peoples R China
[3] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-agent collaboration; reinforcement learning; actor-critic; self-attention; SENSOR NETWORKS; OPTIMIZATION; ALGORITHM;
D O I
10.1142/S0218001422520140
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid development of deep reinforcement learning makes it widely used in multi-agent environments to solve the multi-agent cooperation problem. However, due to the instability of multi-agent environments, the performance is insufficient when using deep reinforcement learning algorithms to train each agent independently. In this work, we use the framework of centralized training with decentralized execution to extend the maximum entropy deep reinforcement learning algorithm Soft Actor-Critic (SAC) and proposes the multi-agent deep reinforcement learning algorithm MASAC based on the maximum entropy framework. Proposed model treats all the agents as part of the environment, it can effectively solve the problem of poor convergence of algorithms due to environmental instability. At the same time, we have noticed the shortcoming of centralized training, using all the information of the agents as input of critics, and it is easy to lose the information related to the current agent. Inspired by the application of self-attention mechanism in machine translation, we use the self-attention mechanism to improve the critic and propose the ATT-MASAC algorithm. Each agent can discover their relationship with other agents through encoder operation and attention calculation as part of the critic networks. Compared with the recent multi-agent deep reinforcement learning algorithms, ATT-MASAC has better convergence effect. Also, it has better stability when the number of agents in the environment increases.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Multi-agent reinforcement learning by the actor-critic model with an attention interface
    Zhang, Lixiang
    Li, Jingchen
    Zhu, Yi'an
    Shi, Haobin
    Hwang, Kao-Shing
    [J]. NEUROCOMPUTING, 2022, 471 : 275 - 284
  • [2] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
    Diddigi, Raghuram Bharadwaj
    Reddy, D. Sai Koti
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    [J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
  • [3] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Prashant Trivedi
    Nandyala Hemachandra
    [J]. Dynamic Games and Applications, 2023, 13 : 25 - 55
  • [4] Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
    Christianos, Filippos
    Schafer, Lukas
    Albrecht, Stefano V.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] Distributed Multi-Agent Reinforcement Learning by Actor-Critic Method
    Heredia, Paulo C.
    Mou, Shaoshuai
    [J]. IFAC PAPERSONLINE, 2019, 52 (20): : 363 - 368
  • [6] A multi-agent reinforcement learning using Actor-Critic methods
    Li, Chun-Gui
    Wang, Meng
    Yuan, Qing-Neng
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 878 - 882
  • [7] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Trivedi, Prashant
    Hemachandra, Nandyala
    [J]. DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 25 - 55
  • [8] Actor-Attention-Critic for Multi-Agent Reinforcement Learning
    Iqbal, Shariq
    Sha, Fei
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [9] Structural relational inference actor-critic for multi-agent reinforcement learning
    Zhang, Xianjie
    Liu, Yu
    Xu, Xiujuan
    Huang, Qiong
    Mao, Hangyu
    Carie, Anil
    [J]. NEUROCOMPUTING, 2021, 459 : 383 - 394
  • [10] Toward Resilient Multi-Agent Actor-Critic Algorithms for Distributed Reinforcement Learning
    Lin, Yixuan
    Gade, Shripad
    Sandhu, Romeil
    Liu, Ji
    [J]. 2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 3953 - 3958