PRACM: Predictive Rewards for Actor-Critic with Mixing Function in Multi-Agent Reinforcement Learning

被引：0

作者：

Yu, Sheng ^{[1
]}

Liu, Bo ^{[1
]}

Zhu, Wei ^{[1
]}

Liu, Shuhong ^{[1
]}

机构：

[1] Natl Univ Def Technol, Sch Informat & Commun, Wuhan 430014, Peoples R China

来源：

KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT IV, KSEM 2023 | 2023年 / 14120卷

关键词：

Multi-agent reinforcement learning; Discrete action; Collaborative task; Mixing function; Predictive reward;

D O I：

10.1007/978-3-031-40292-0_7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Inspired by the centralised training with decentralised execution (CTDE) paradigm, the field of multi-agent reinforcement learning (MARL) has made significant progress in tackling cooperative problems with discrete action spaces. Nevertheless, many existing algorithms suffer from significant performance degradation when faced with large numbers of agents or more challenging tasks. Furthermore, some specific scenarios, such as cooperative environments with penalties, pose significant challenges to these algorithms, which often lack sufficient cooperative behavior to converge successfully. A new approach, called PRACM, based on the Actor-Critic framework is proposed in this study to address these issues. PRACM employs a monotonic mixing function to generate a global action value function, Qtot, which is used to compute the loss function for updating the critic network. To handle the discrete action space, PRACM uses Gumbel-Softmax. And to promote cooperation among agents and to adapt to cooperative environments with penalties, the predictive rewards is introduced. PRACM was evaluated against several baseline algorithms in "Cooperative Predator-Prey" and the challenging "SMAC" scenarios. The results of this study illustrate that PRACM scales well as the number of agents and task difficulty increase, and performs better in cooperative tasks with penalties, demonstrating its usefulness in promoting collaboration among agents.

引用

页码：69 / 82

页数：14

共 50 条

[41] Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
Zhou, Ruida
Liu, Tao
Cheng, Min
Kalathil, Dileep
Kumar, P. R.
Tian, Chao
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[42] An Object Oriented Approach to Fuzzy Actor-Critic Learning for Multi-Agent Differential Games
Schwartz, Howard
2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 183 - 190
[43] An actor-critic algorithm for multi-agent learning in queue-based stochastic games
Sundar, D. Krishna
Ravikumar, K.
NEUROCOMPUTING, 2014, 127 : 258 - 265
[44] A World Model for Actor-Critic in Reinforcement Learning
Panov, A. I.
Ugadiarov, L. A.
PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
[45] Actor-Critic based Improper Reinforcement Learning
Zaki, Mohammadi
Mohan, Avinash
Gopalan, Aditya
Mannor, Shie
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[46] Curious Hierarchical Actor-Critic Reinforcement Learning
Roeder, Frank
Eppe, Manfred
Nguyen, Phuong D. H.
Wermter, Stefan
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 408 - 419
[47] Integrated Actor-Critic for Deep Reinforcement Learning
Zheng, Jiaohao
Kurt, Mehmet Necip
Wang, Xiaodong
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV, 2021, 12894 : 505 - 518
[48] Capacity-Limited Decentralized Actor-Critic for Multi-Agent Games
Malloy, Tyler
Sims, Chris R.
Klinger, Tim
Liu, Miao
Riemer, Matthew
Tesauro, Gerald
2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 332 - 339
[49] A fuzzy Actor-Critic reinforcement learning network
Wang, Xue-Song
Cheng, Yu-Hu
Yi, Jian-Qiang
INFORMATION SCIENCES, 2007, 177 (18) : 3764 - 3781
[50] Bi-level Multi-Agent Actor-Critic Methods with Transformers
Wan, Tianjiao
Mi, Haibo
Gao, Zijian
Zhai, Yuanzhao
Ding, Bo
Feng, Dawei
2023 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING, JCC, 2023, : 9 - 16

← 1 2 3 4 5 →