PRACM: Predictive Rewards for Actor-Critic with Mixing Function in Multi-Agent Reinforcement Learning

被引：0

作者：

Yu, Sheng ^{[1
]}

Liu, Bo ^{[1
]}

Zhu, Wei ^{[1
]}

Liu, Shuhong ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sch Informat & Commun, Wuhan 430014, Peoples R China

来源：

KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2023 | 2023年 / 14120卷

关键词：

Multi-agent reinforcement learning; Discrete action; Collaborative task; Mixing function; Predictive reward;

D O I：

10.1007/978-3-031-40292-0_7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Inspired by the centralised training with decentralised execution (CTDE) paradigm, the field of multi-agent reinforcement learning (MARL) has made significant progress in tackling cooperative problems with discrete action spaces. Nevertheless, many existing algorithms suffer from significant performance degradation when faced with large numbers of agents or more challenging tasks. Furthermore, some specific scenarios, such as cooperative environments with penalties, pose significant challenges to these algorithms, which often lack sufficient cooperative behavior to converge successfully. A new approach, called PRACM, based on the Actor-Critic framework is proposed in this study to address these issues. PRACM employs a monotonic mixing function to generate a global action value function, Qtot, which is used to compute the loss function for updating the critic network. To handle the discrete action space, PRACM uses Gumbel-Softmax. And to promote cooperation among agents and to adapt to cooperative environments with penalties, the predictive rewards is introduced. PRACM was evaluated against several baseline algorithms in "Cooperative Predator-Prey" and the challenging "SMAC" scenarios. The results of this study illustrate that PRACM scales well as the number of agents and task difficulty increase, and performs better in cooperative tasks with penalties, demonstrating its usefulness in promoting collaboration among agents.

引用

页码：69 / 82

页数：14

共 50 条

[1] Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning
Xiao, Yuchen
Tan, Weihao
Amato, Christopher
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[2] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
Diddigi, Raghuram Bharadwaj
Reddy, D. Sai Koti
Prabuchandran, K. J.
Bhatnagar, Shalabh
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
[3] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
Prashant Trivedi
Nandyala Hemachandra
Dynamic Games and Applications, 2023, 13 : 25 - 55
[4] Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
Christianos, Filippos
Schafer, Lukas
Albrecht, Stefano V.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[5] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
Trivedi, Prashant
Hemachandra, Nandyala
DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 25 - 55
[6] A multi-agent reinforcement learning using Actor-Critic methods
Li, Chun-Gui
Wang, Meng
Yuan, Qing-Neng
PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 878 - 882
[7] Distributed Multi-Agent Reinforcement Learning by Actor-Critic Method
Heredia, Paulo C.
Mou, Shaoshuai
IFAC PAPERSONLINE, 2019, 52 (20): : 363 - 368
[8] Actor-Critic for Multi-Agent Reinforcement Learning with Self-Attention
Zhao, Juan
Zhu, Tong
Xiao, Shuo
Gao, Zongqian
Sun, Hao
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (09)
[9] Multi-agent reinforcement learning by the actor-critic model with an attention interface
Zhang, Lixiang
Li, Jingchen
Zhu, Yi'an
Shi, Haobin
Hwang, Kao-Shing
NEUROCOMPUTING, 2022, 471 : 275 - 284
[10] Multi-agent dual actor-critic framework for reinforcement learning navigation
Xiong, Fengguang
Zhang, Yaodan
Kuang, Xinhe
He, Ligang
Han, Xie
APPLIED INTELLIGENCE, 2025, 55 (02)

← 1 2 3 4 5 →