Actor-Critic for Multi-Agent Reinforcement Learning with Self-Attention

被引:2
|
作者
Zhao, Juan [1 ]
Zhu, Tong [2 ]
Xiao, Shuo [3 ]
Gao, Zongqian [3 ]
Sun, Hao [3 ]
机构
[1] Henan Ind & Trade Vocat Coll, Dept Mech & Elect Engn, Zhengzhou, Peoples R China
[2] Zhengzhou Coal Ind Grp Co Ltd, Zhengzhou, Peoples R China
[3] China Univ Min & Technol, Sch Comp Sci & Technol, Xuzhou, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-agent collaboration; reinforcement learning; actor-critic; self-attention; SENSOR NETWORKS; OPTIMIZATION; ALGORITHM;
D O I
10.1142/S0218001422520140
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid development of deep reinforcement learning makes it widely used in multi-agent environments to solve the multi-agent cooperation problem. However, due to the instability of multi-agent environments, the performance is insufficient when using deep reinforcement learning algorithms to train each agent independently. In this work, we use the framework of centralized training with decentralized execution to extend the maximum entropy deep reinforcement learning algorithm Soft Actor-Critic (SAC) and proposes the multi-agent deep reinforcement learning algorithm MASAC based on the maximum entropy framework. Proposed model treats all the agents as part of the environment, it can effectively solve the problem of poor convergence of algorithms due to environmental instability. At the same time, we have noticed the shortcoming of centralized training, using all the information of the agents as input of critics, and it is easy to lose the information related to the current agent. Inspired by the application of self-attention mechanism in machine translation, we use the self-attention mechanism to improve the critic and propose the ATT-MASAC algorithm. Each agent can discover their relationship with other agents through encoder operation and attention calculation as part of the critic networks. Compared with the recent multi-agent deep reinforcement learning algorithms, ATT-MASAC has better convergence effect. Also, it has better stability when the number of agents in the environment increases.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] A New Advantage Actor-Critic Algorithm For Multi-Agent Environments
    Paczolay, Gabor
    Harmati, Istvan
    [J]. 2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
  • [32] Improving sample efficiency in Multi-Agent Actor-Critic methods
    Ye, Zhenhui
    Chen, Yining
    Jiang, Xiaohong
    Song, Guanghua
    Yang, Bowei
    Fan, Sheng
    [J]. APPLIED INTELLIGENCE, 2022, 52 (04) : 3691 - 3704
  • [33] Multi-agent actor-critic with time dynamical opponent model
    Tian, Yuan
    Kladny, Klaus -Rudolf
    Wang, Qin
    Huang, Zhiwu
    Fink, Olga
    [J]. NEUROCOMPUTING, 2023, 517 : 165 - 172
  • [34] Improving sample efficiency in Multi-Agent Actor-Critic methods
    Zhenhui Ye
    Yining Chen
    Xiaohong Jiang
    Guanghua Song
    Bowei Yang
    Sheng Fan
    [J]. Applied Intelligence, 2022, 52 : 3691 - 3704
  • [35] An Object Oriented Approach to Fuzzy Actor-Critic Learning for Multi-Agent Differential Games
    Schwartz, Howard
    [J]. 2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 183 - 190
  • [36] An actor-critic algorithm for multi-agent learning in queue-based stochastic games
    Sundar, D. Krishna
    Ravikumar, K.
    [J]. NEUROCOMPUTING, 2014, 127 : 258 - 265
  • [37] Position-Aware Communication via Self-Attention for Multi-Agent Reinforcement Learning
    Shih, Tsan-Hua
    Lin, Hsien-, I
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS - TAIWAN (ICCE-TAIWAN), 2020,
  • [38] A World Model for Actor-Critic in Reinforcement Learning
    Panov, A. I.
    Ugadiarov, L. A.
    [J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
  • [39] Actor-Critic based Improper Reinforcement Learning
    Zaki, Mohammadi
    Mohan, Avinash
    Gopalan, Aditya
    Mannor, Shie
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [40] Curious Hierarchical Actor-Critic Reinforcement Learning
    Roeder, Frank
    Eppe, Manfred
    Nguyen, Phuong D. H.
    Wermter, Stefan
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419