Improving sample efficiency in Multi-Agent Actor-Critic methods

被引：11

作者：

Ye, Zhenhui ^{[1
]}

Chen, Yining ^{[1
]}

Jiang, Xiaohong ^{[2
]}

Song, Guanghua ^{[1
]}

Yang, Bowei ^{[1
]}

Fan, Sheng ^{[1
]}

机构：

[1] Zhejiang Univ, Sch Aeronaut & Astronaut, Hangzhou 310027, Peoples R China

[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China

来源：

APPLIED INTELLIGENCE | 2022年 / 52卷 / 04期

关键词：

Multi agent; Reinforcement learning; Sample efficiency; Data augmentation; GAME;

D O I：

10.1007/s10489-021-02554-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The popularity of multi-agent deep reinforcement learning (MADRL) is growing rapidly with the demand for large-scale real-world tasks that require swarm intelligence, and many studies have improved MADRL from the perspective of network structures or reinforcement learning methods. However, the application of MADRL in the real world is hampered by the low sample efficiency of the models and the high cost to collect data. To improve the practicability, an extension to the current training paradigm of MADRL that improves the sample efficiency is imperative. To this end, this paper proposes PEDMA, a flexible plugin unit for MADRL. It consists of three techniques: (i)Parallel Environments (PE), to accelerate the data acquisition; (ii)Experience Augmentation (EA), a novel data augmentation method that utilizes the permutation invariance property of the multi-agent system to reduce the cost of acquiring data; and (iii)Delayed Updated Policies (DUP), to improve the data utilization efficiency of the MADRL model. The proposed EA method could improve the performance, data efficiency, and convergence speed of MADRL models, which is theoretically and empirically demonstrated. Experiments on three multi-agent benchmark tasks show that the MAAC model trained with PEDMA outperforms the baselines and state-of-the-art algorithms, and ablation studies show the contribution and necessity of each component in PEDMA.

引用

页码：3691 / 3704

页数：14

共 50 条

[1] Improving sample efficiency in Multi-Agent Actor-Critic methods
Zhenhui Ye
Yining Chen
Xiaohong Jiang
Guanghua Song
Bowei Yang
Sheng Fan
[J]. Applied Intelligence, 2022, 52 : 3691 - 3704
[2] A multi-agent reinforcement learning using Actor-Critic methods
Li, Chun-Gui
Wang, Meng
Yuan, Qing-Neng
[J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 878 - 882
[3] Bi-level Multi-Agent Actor-Critic Methods with Transformers
Wan, Tianjiao
Mi, Haibo
Gao, Zijian
Zhai, Yuanzhao
Ding, Bo
Feng, Dawei
[J]. 2023 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING, JCC, 2023, : 9 - 16
[4] B -Level Actor-Critic for Multi-Agent Coordination
Zhang, Haifeng
Chen, Weizhe
Huang, Zeren
Li, Minne
Yang, Yaodong
Zhang, Weinan
Wang, Jun
[J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7325 - 7332
[5] Divergence-Regularized Multi-Agent Actor-Critic
Su, Kefan
Lu, Zongqing
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[6] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
Diddigi, Raghuram Bharadwaj
Reddy, D. Sai Koti
Prabuchandran, K. J.
Bhatnagar, Shalabh
[J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
[7] A New Advantage Actor-Critic Algorithm For Multi-Agent Environments
Paczolay, Gabor
Harmati, Istvan
[J]. 2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
[8] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
Prashant Trivedi
Nandyala Hemachandra
[J]. Dynamic Games and Applications, 2023, 13 : 25 - 55
[9] Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
Christianos, Filippos
Schafer, Lukas
Albrecht, Stefano V.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[10] Multi-agent actor-critic with time dynamical opponent model
Tian, Yuan
Kladny, Klaus -Rudolf
Wang, Qin
Huang, Zhiwu
Fink, Olga
[J]. NEUROCOMPUTING, 2023, 517 : 165 - 172

← 1 2 3 4 5 →