PRACM: Predictive Rewards for Actor-Critic with Mixing Function in Multi-Agent Reinforcement Learning

被引:0
|
作者
Yu, Sheng [1 ]
Liu, Bo [1 ]
Zhu, Wei [1 ]
Liu, Shuhong [1 ]
机构
[1] Natl Univ Def Technol, Sch Informat & Commun, Wuhan 430014, Peoples R China
关键词
Multi-agent reinforcement learning; Discrete action; Collaborative task; Mixing function; Predictive reward;
D O I
10.1007/978-3-031-40292-0_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Inspired by the centralised training with decentralised execution (CTDE) paradigm, the field of multi-agent reinforcement learning (MARL) has made significant progress in tackling cooperative problems with discrete action spaces. Nevertheless, many existing algorithms suffer from significant performance degradation when faced with large numbers of agents or more challenging tasks. Furthermore, some specific scenarios, such as cooperative environments with penalties, pose significant challenges to these algorithms, which often lack sufficient cooperative behavior to converge successfully. A new approach, called PRACM, based on the Actor-Critic framework is proposed in this study to address these issues. PRACM employs a monotonic mixing function to generate a global action value function, Qtot, which is used to compute the loss function for updating the critic network. To handle the discrete action space, PRACM uses Gumbel-Softmax. And to promote cooperation among agents and to adapt to cooperative environments with penalties, the predictive rewards is introduced. PRACM was evaluated against several baseline algorithms in "Cooperative Predator-Prey" and the challenging "SMAC" scenarios. The results of this study illustrate that PRACM scales well as the number of agents and task difficulty increase, and performs better in cooperative tasks with penalties, demonstrating its usefulness in promoting collaboration among agents.
引用
收藏
页码:69 / 82
页数:14
相关论文
共 50 条
  • [1] Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
    Xiao, Yuchen
    Tan, Weihao
    Amato, Christopher
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
    Diddigi, Raghuram Bharadwaj
    Reddy, D. Sai Koti
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
  • [3] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Prashant Trivedi
    Nandyala Hemachandra
    Dynamic Games and Applications, 2023, 13 : 25 - 55
  • [4] Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
    Christianos, Filippos
    Schafer, Lukas
    Albrecht, Stefano V.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Trivedi, Prashant
    Hemachandra, Nandyala
    DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 25 - 55
  • [6] A multi-agent reinforcement learning using Actor-Critic methods
    Li, Chun-Gui
    Wang, Meng
    Yuan, Qing-Neng
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 878 - 882
  • [7] Distributed Multi-Agent Reinforcement Learning by Actor-Critic Method
    Heredia, Paulo C.
    Mou, Shaoshuai
    IFAC PAPERSONLINE, 2019, 52 (20): : 363 - 368
  • [8] Actor-Critic for Multi-Agent Reinforcement Learning with Self-Attention
    Zhao, Juan
    Zhu, Tong
    Xiao, Shuo
    Gao, Zongqian
    Sun, Hao
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (09)
  • [9] Multi-agent reinforcement learning by the actor-critic model with an attention interface
    Zhang, Lixiang
    Li, Jingchen
    Zhu, Yi'an
    Shi, Haobin
    Hwang, Kao-Shing
    NEUROCOMPUTING, 2022, 471 : 275 - 284
  • [10] Multi-agent dual actor-critic framework for reinforcement learning navigation
    Xiong, Fengguang
    Zhang, Yaodan
    Kuang, Xinhe
    He, Ligang
    Han, Xie
    APPLIED INTELLIGENCE, 2025, 55 (02)