PRACM: Predictive Rewards for Actor-Critic with Mixing Function in Multi-Agent Reinforcement Learning

被引：0

作者：

Yu, Sheng ^{[1
]}

Liu, Bo ^{[1
]}

Zhu, Wei ^{[1
]}

Liu, Shuhong ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sch Informat & Commun, Wuhan 430014, Peoples R China

来源：

KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2023 | 2023年 / 14120卷

关键词：

Multi-agent reinforcement learning; Discrete action; Collaborative task; Mixing function; Predictive reward;

D O I：

10.1007/978-3-031-40292-0_7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Inspired by the centralised training with decentralised execution (CTDE) paradigm, the field of multi-agent reinforcement learning (MARL) has made significant progress in tackling cooperative problems with discrete action spaces. Nevertheless, many existing algorithms suffer from significant performance degradation when faced with large numbers of agents or more challenging tasks. Furthermore, some specific scenarios, such as cooperative environments with penalties, pose significant challenges to these algorithms, which often lack sufficient cooperative behavior to converge successfully. A new approach, called PRACM, based on the Actor-Critic framework is proposed in this study to address these issues. PRACM employs a monotonic mixing function to generate a global action value function, Qtot, which is used to compute the loss function for updating the critic network. To handle the discrete action space, PRACM uses Gumbel-Softmax. And to promote cooperation among agents and to adapt to cooperative environments with penalties, the predictive rewards is introduced. PRACM was evaluated against several baseline algorithms in "Cooperative Predator-Prey" and the challenging "SMAC" scenarios. The results of this study illustrate that PRACM scales well as the number of agents and task difficulty increase, and performs better in cooperative tasks with penalties, demonstrating its usefulness in promoting collaboration among agents.

引用

页码：69 / 82

页数：14

共 50 条

[31] Deployment Algorithm of Service Function Chain Based on Multi-Agent Soft Actor-Critic Learning
Tang, Lun
Li, Shirui
Du, Yucong
Chen, Qianbin
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (08) : 2893 - 2901
[32] AHAC: Actor Hierarchical Attention Critic for Multi-Agent Reinforcement Learning
Wang, Yajie
Shi, Dianxi
Xue, Chao
Jiang, Hao
Wang, Gongju
Gong, Peng
2020 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2020, : 3013 - 3020
[33] Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
Ren, Jineng
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
[34] Deep Reinforcement Learning-Based Multi-Agent System with Advanced Actor-Critic Framework for Complex Environment
Cui, Zihao
Deng, Kailian
Zhang, Hongtao
Zha, Zhongyi
Jobaer, Sayed
MATHEMATICS, 2025, 13 (05)
[35] A New Advantage Actor-Critic Algorithm For Multi-Agent Environments
Paczolay, Gabor
Harmati, Istvan
2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
[36] Improving sample efficiency in Multi-Agent Actor-Critic methods
Ye, Zhenhui
Chen, Yining
Jiang, Xiaohong
Song, Guanghua
Yang, Bowei
Fan, Sheng
APPLIED INTELLIGENCE, 2022, 52 (04) : 3691 - 3704
[37] Multi-agent actor-critic with time dynamical opponent model
Tian, Yuan
Kladny, Klaus -Rudolf
Wang, Qin
Huang, Zhiwu
Fink, Olga
NEUROCOMPUTING, 2023, 517 : 165 - 172
[38] Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
Veeriah, Vivek
van Seijen, Harm
Sutton, Richard S.
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 556 - 564
[39] Multi-Agent Actor-Critic with Hierarchical Graph Attention Network
Ryu, Heechang
Shin, Hayong
Park, Jinkyoo
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7236 - 7243
[40] Improving sample efficiency in Multi-Agent Actor-Critic methods
Zhenhui Ye
Yining Chen
Xiaohong Jiang
Guanghua Song
Bowei Yang
Sheng Fan
Applied Intelligence, 2022, 52 : 3691 - 3704

← 1 2 3 4 5 →