A Cooperation Graph Approach for Multiagent Sparse Reward Reinforcement Learning

被引：2

作者：

Fu, Qingxu ^{[1
,2
]}

Qiu, Tenghai ^{[1
,2
]}

Pu, Zhiqiang ^{[1
,2
]}

Yi, Jianqiang ^{[1
,2
]}

Yuan, Wanmai ^{[3
]}

机构：

[1] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

[3] Corp Informat Sci Acad China, Elect Technol Grp, Beijing, Peoples R China

来源：

2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年

基金：

中国国家自然科学基金;

关键词：

multiagent system; reinforcement learning; sparse reward;

D O I：

10.1109/IJCNN55064.2022.9891991

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multiagent reinforcement learning (MARL) can solve complex cooperative tasks. However, the efficiency of existing MARL methods relies heavily on well-defined reward functions. Multiagent tasks with sparse reward feedback are especially challenging not only because of the credit distribution problem, but also due to the low probability of obtaining positive reward feedback. In this paper, we design a graph network called Cooperation Graph (CG). The Cooperation Graph is the combination of two simple bipartite graphs, namely, the Agent Clustering subgraph (ACG) and the Cluster Designating subgraph (CDG). Next, based on this novel graph structure, we propose a Cooperation Graph Multiagent Reinforcement Learning (CG-MARL) algorithm, which can efficiently deal with the sparse reward problem in multiagent tasks. In CG-MARL, agents are directly controlled by the Cooperation Graph. And a policy neural network is trained to manipulate this Cooperation Graph, guiding agents to achieve cooperation in an implicit way. This hierarchical feature of CG-MARL provides space for customized cluster-actions, an extensible interface for introducing fundamental cooperation knowledge. In experiments, CG-MARL shows state-of-the-art performance in sparse reward multiagent benchmarks, including the anti-invasion interception task and the multi-cargo delivery task.

引用

页数：8

共 50 条

[21] Graph convolutional recurrent networks for reward shaping in reinforcement learning
Sami, Hani
Bentahar, Jamal
Mourad, Azzam
Otrok, Hadi
Damiani, Ernesto
[J]. INFORMATION SCIENCES, 2022, 608 : 63 - 80
[22] A Multiagent Reinforcement Learning Control Approach to Environment Exploration
Imtiaz, Mohammad S.
Wang, Jing
[J]. SOUTHEASTCON 2017, 2017,
[23] Fuzzy Graph and Collective Multiagent Reinforcement Learning for Traffic Signals Control
Abdoos, Monireh
[J]. IEEE INTELLIGENT SYSTEMS, 2021, 36 (04) : 48 - 55
[24] VGN: Value Decomposition With Graph Attention Networks for Multiagent Reinforcement Learning
Wei, Qinglai
Li, Yugu
Zhang, Jie
Wang, Fei-Yue
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (01) : 182 - 195
[25] A multiagent reinforcement learning approach based on different states
李珺
潘启树
[J]. Journal of Harbin Institute of Technology(New series), 2010, (03) : 419 - 423
[26] Graph Based Optimization For Multiagent Cooperation
Singh, Arambam James
Kumar, Akshat
[J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1497 - 1505
[27] Robotic Control in Adversarial and Sparse Reward Environments: A Robust Goal-Conditioned Reinforcement Learning Approach
He, Xiangkun
Lv, Chen
[J]. IEEE Transactions on Artificial Intelligence, 2024, 5 (01): : 244 - 253
[28] Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks
Sohn, Sungryull
Lee, Sungtae
Choi, Jongwook
van Seijen, Harm
Fatemi, Mehdi
Lee, Honglak
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[29] Sparse reward for reinforcement learning-based continuous integration testing
Yang, Yang
Li, Zheng
Shang, Ying
Li, Qianyu
[J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2023, 35 (06)
[30] Reinforcement Learning in Sparse-Reward Environments With Hindsight Policy Gradients
Rauber, Paulo
Ummadisingu, Avinash
Mutz, Filipe
Schmidhuber, Juergen
[J]. NEURAL COMPUTATION, 2021, 33 (06) : 1498 - 1553

← 1 2 3 4 5 →