Distributed entropy-regularized multi-agent reinforcement learning with policy consensus

被引：1

作者：

Hu, Yifan ^{[1
]}

Fu, Junjie ^{[1
]}

Wen, Guanghui ^{[1
]}

Lv, Yuezu ^{[2
]}

Ren, Wei ^{[3
]}

机构：

[1] Southeast Univ, Sch Math, Nanjing 210096, Peoples R China

[2] Beijing Inst Technol, Adv Res Inst Multidisciplinary Sci, Beijing 100081, Peoples R China

[3] Univ Calif Riverside, Dept Elect & Comp Engn, Riverside, CA 92521 USA

来源：

AUTOMATICA | 2024年 / 164卷

关键词：

Distributed actor-critic algorithm; Networked multi-agent system; Entropy regularization; Deep reinforcement learning; ALGORITHM; NETWORKS;

D O I：

10.1016/j.automatica.2024.111652

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sample efficiency is a limiting factor for existing distributed multi -agent reinforcement learning (MARL) algorithms over networked multi -agent systems. In this paper, the sample efficiency problem is tackled by formally incorporating the entropy regularization into the distributed MARL algorithm design. Firstly, a new entropy -regularized MARL problem is formulated under the model of networked multi -agent Markov decision processes with observation -based policies and homogeneous agents, where the policy parameter sharing among the agents provably preserves the optimality. Secondly, an on -policy distributed actor-critic algorithm is proposed, where each agent shares its parameters of both the critic and actor for consensus update. Then, the convergence analysis of the proposed algorithm is provided based on the stochastic approximation theory under the assumption of linear function approximation of the critic. Furthermore, a practical off -policy version of the proposed algorithm is developed which possesses scalability, data efficiency and learning stability. Finally, the proposed distributed algorithm is compared against the solid baselines including two classic centralized training algorithms in the multi -agent particle environment, whose learning performance is empirically demonstrated through extensive simulation experiments. (c) 2024 Elsevier Ltd. All rights reserved.

引用

页数：13

共 50 条

[41] MABQN: Multi-agent reinforcement learning algorithm with discrete policy
Xie, Qing
Wang, Zicheng
Fang, Yuyuan
Li, Yukai
NEUROCOMPUTING, 2025, 626
[42] Adversarial attacks in consensus-based multi-agent reinforcement learning
Figura, Martin
Kosaraju, Krishna Chaitanya
Gupta, Vijay
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 3050 - 3055
[43] Distributed Consensus-Based Multi-Agent Off-Policy Temporal-Difference Learning
Stankovic, Milos S.
Beko, Marko
Stankovic, Srdjan S.
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 5976 - 5981
[44] Hierarchical Reinforcement Learning with Opponent Modeling for Distributed Multi-agent Cooperation
Liang, Zhixuan
Cao, Jiannong
Jiang, Shan
Saxena, Divya
Xu, Huafeng
2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022), 2022, : 884 - 894
[45] Multi-Agent Distributed Reinforcement Learning for Making Decentralized Offloading Decisions
Tan, Jing
Khalili, Ramin
Karl, Holger
Hecker, Artur
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 2098 - 2107
[46] Distributed, Heterogeneous, Multi-Agent Social Coordination via Reinforcement Learning
Shi, Dongqing
Sauter, Michael Z.
Kralik, Jerald D.
2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (ROBIO 2009), VOLS 1-4, 2009, : 653 - 658
[47] Dynamic distributed constraint optimization using multi-agent reinforcement learning
Shokoohi, Maryam
Afsharchi, Mohsen
Shah-Hoseini, Hamed
SOFT COMPUTING, 2022, 26 (08) : 3601 - 3629
[48] Dynamic distributed constraint optimization using multi-agent reinforcement learning
Maryam Shokoohi
Mohsen Afsharchi
Hamed Shah-Hoseini
Soft Computing, 2022, 26 : 3601 - 3629
[49] Distributed Signal Control of Multi-agent Reinforcement Learning Based on Game
Qu Z.-W.
Pan Z.-T.
Chen Y.-H.
Li H.-T.
Wang X.
Chen, Yong-Heng (cyh@jlu.edu.cn), 1600, Science Press (20): : 76 - 82and100
[50] Distributed Multi-agent Reinforcement Learning for Directional UAV Network Control
He, Linsheng
Zhao, Jiamiao
Hu, Fei
PROCEEDINGS OF THE 32ND INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2023, 2023, : 317 - 318

← 1 2 3 4 5 →