Distributed entropy-regularized multi-agent reinforcement learning with policy consensus

被引：1

作者：

Hu, Yifan ^{[1
]}

Fu, Junjie ^{[1
]}

Wen, Guanghui ^{[1
]}

Lv, Yuezu ^{[2
]}

Ren, Wei ^{[3
]}

机构：

[1] Southeast Univ, Sch Math, Nanjing 210096, Peoples R China

[2] Beijing Inst Technol, Adv Res Inst Multidisciplinary Sci, Beijing 100081, Peoples R China

[3] Univ Calif Riverside, Dept Elect & Comp Engn, Riverside, CA 92521 USA

来源：

AUTOMATICA | 2024年 / 164卷

关键词：

Distributed actor-critic algorithm; Networked multi-agent system; Entropy regularization; Deep reinforcement learning; ALGORITHM; NETWORKS;

D O I：

10.1016/j.automatica.2024.111652

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sample efficiency is a limiting factor for existing distributed multi -agent reinforcement learning (MARL) algorithms over networked multi -agent systems. In this paper, the sample efficiency problem is tackled by formally incorporating the entropy regularization into the distributed MARL algorithm design. Firstly, a new entropy -regularized MARL problem is formulated under the model of networked multi -agent Markov decision processes with observation -based policies and homogeneous agents, where the policy parameter sharing among the agents provably preserves the optimality. Secondly, an on -policy distributed actor-critic algorithm is proposed, where each agent shares its parameters of both the critic and actor for consensus update. Then, the convergence analysis of the proposed algorithm is provided based on the stochastic approximation theory under the assumption of linear function approximation of the critic. Furthermore, a practical off -policy version of the proposed algorithm is developed which possesses scalability, data efficiency and learning stability. Finally, the proposed distributed algorithm is compared against the solid baselines including two classic centralized training algorithms in the multi -agent particle environment, whose learning performance is empirically demonstrated through extensive simulation experiments. (c) 2024 Elsevier Ltd. All rights reserved.

引用

页数：13

共 50 条

[31] Towards a Distributed Framework for Multi-Agent Reinforcement Learning Research
Zhou, Yutai
Manuel, Shawn
Morales, Peter
Li, Sheng
Pena, Jaime
Allen, Ross
2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
[32] Multi-Agent Deep Reinforcement Learning for Distributed Satellite Routing
Lozano-Cuadra, Federico
Soret, Beatriz
2024 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING FOR COMMUNICATION AND NETWORKING, ICMLCN 2024, 2024, : 554 - 555
[33] A Multi-agent Reinforcement Learning Perspective on Distributed Traffic Engineering
Geng, Nan
Lan, Tian
Aggarwal, Vaneet
Yang, Yuan
Xu, Mingwei
2020 IEEE 28TH INTERNATIONAL CONFERENCE ON NETWORK PROTOCOLS (IEEE ICNP 2020), 2020,
[34] Distributed hierarchical reinforcement learning in multi-agent adversarial environments
Naderializadeh, Navid
Soleyman, Sean
Hung, Fan
Khosla, Deepak
Chen, Yang
Fadaie, Joshua G.
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS IV, 2022, 12113
[35] Distributed consensus for nonlinear multi-agent systems with two-time-scales: A hybrid reinforcement learning consensus algorithm*
Peng, Chuanjun
Xia, Jianwei
Wang, Jing
Shen, Hao
INFORMATION SCIENCES, 2023, 641
[36] Off-policy Reinforcement Learning for Distributed Output Synchronization of Linear Multi-agent Systems
Kiumarsi, Bahare
Lewis, Frank L.
2017 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2017, : 1877 - 1884
[37] A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
Suttle, Wesley
Yang, Zhuoran
Zhang, Kaiqing
Wang, Zhaoran
Basar, Tamer
Liu, Ji
IFAC PAPERSONLINE, 2020, 53 (02): : 1549 - 1554
[38] Multi-Agent Reinforcement Learning
Stankovic, Milos
2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43
[39] Approximately Solving Mean Field Games via Entropy-Regularized Deep Reinforcement Learning
Cui, Kai
Koeppl, Heinz
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[40] Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning
Mu, Ronghui
Ruan, Wenjie
Marcolino, Leandro Soriano
Jin, Gaojie
Ni, Qiang
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15046 - 15054

← 1 2 3 4 5 →