Distributed entropy-regularized multi-agent reinforcement learning with policy consensus

被引：1

作者：

Hu, Yifan ^{[1
]}

Fu, Junjie ^{[1
]}

Wen, Guanghui ^{[1
]}

Lv, Yuezu ^{[2
]}

Ren, Wei ^{[3
]}

机构：

[1] Southeast Univ, Sch Math, Nanjing 210096, Peoples R China

[2] Beijing Inst Technol, Adv Res Inst Multidisciplinary Sci, Beijing 100081, Peoples R China

[3] Univ Calif Riverside, Dept Elect & Comp Engn, Riverside, CA 92521 USA

来源：

AUTOMATICA | 2024年 / 164卷

关键词：

Distributed actor-critic algorithm; Networked multi-agent system; Entropy regularization; Deep reinforcement learning; ALGORITHM; NETWORKS;

D O I：

10.1016/j.automatica.2024.111652

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sample efficiency is a limiting factor for existing distributed multi -agent reinforcement learning (MARL) algorithms over networked multi -agent systems. In this paper, the sample efficiency problem is tackled by formally incorporating the entropy regularization into the distributed MARL algorithm design. Firstly, a new entropy -regularized MARL problem is formulated under the model of networked multi -agent Markov decision processes with observation -based policies and homogeneous agents, where the policy parameter sharing among the agents provably preserves the optimality. Secondly, an on -policy distributed actor-critic algorithm is proposed, where each agent shares its parameters of both the critic and actor for consensus update. Then, the convergence analysis of the proposed algorithm is provided based on the stochastic approximation theory under the assumption of linear function approximation of the critic. Furthermore, a practical off -policy version of the proposed algorithm is developed which possesses scalability, data efficiency and learning stability. Finally, the proposed distributed algorithm is compared against the solid baselines including two classic centralized training algorithms in the multi -agent particle environment, whose learning performance is empirically demonstrated through extensive simulation experiments. (c) 2024 Elsevier Ltd. All rights reserved.

引用

页数：13

共 50 条

[1] Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
Zhao, Rui
Sun, Xudong
Tresp, Volker
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[2] Learning Distributed Coordinated Policy in Catching Game with Multi-Agent Reinforcement Learning
Liu, Xiangyu
Tan, Ying
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[3] Consensus Learning for Cooperative Multi-Agent Reinforcement Learning
Xu, Zhiwei
Zhang, Bin
Li, Dapeng
Zhang, Zeren
Zhou, Guangchong
Chen, Hao
Fan, Guoliang
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10, 2023, : 11726 - 11734
[4] Multi-Agent Reinforcement Learning With Distributed Targeted Multi-Agent Communication
Xu, Chi
Zhang, Hui
Zhang, Ya
2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 2915 - 2920
[5] Parallel and distributed multi-agent reinforcement learning
Kaya, M
Arslan, A
PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, 2001, : 437 - 441
[6] Coding for Distributed Multi-Agent Reinforcement Learning
Wang, Baoqian
Xie, Junfei
Atanasov, Nikolay
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10625 - 10631
[7] Distributed reinforcement learning in multi-agent networks
Kar, Soummya
Moura, Jose M. F.
Poor, H. Vincent
2013 IEEE 5TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2013), 2013, : 296 - +
[8] Reinforcement learning for multi-agent patrol policy
Lab. of Complex Systems and Intelligence Sciences, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Proc. IEEE Int. Conf. Cognitive Informatics, ICCI, (530-535):
[9] TEAM POLICY LEARNING FOR MULTI-AGENT REINFORCEMENT LEARNING
Cassano, Lucas
Alghunaim, Sulaiman A.
Sayed, Ali H.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3062 - 3066
[10] Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning
Zhao, Xiaoxiao
Yi, Peng
Li, Li
CONTROL THEORY AND TECHNOLOGY, 2020, 18 (04) : 362 - 378

← 1 2 3 4 5 →