Distributed entropy-regularized multi-agent reinforcement learning with policy consensus

被引：1

作者：

Hu, Yifan ^{[1
]}

Fu, Junjie ^{[1
]}

Wen, Guanghui ^{[1
]}

Lv, Yuezu ^{[2
]}

Ren, Wei ^{[3
]}

机构：

[1] Southeast Univ, Sch Math, Nanjing 210096, Peoples R China

[2] Beijing Inst Technol, Adv Res Inst Multidisciplinary Sci, Beijing 100081, Peoples R China

[3] Univ Calif Riverside, Dept Elect & Comp Engn, Riverside, CA 92521 USA

来源：

AUTOMATICA | 2024年 / 164卷

关键词：

Distributed actor-critic algorithm; Networked multi-agent system; Entropy regularization; Deep reinforcement learning; ALGORITHM; NETWORKS;

D O I：

10.1016/j.automatica.2024.111652

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Sample efficiency is a limiting factor for existing distributed multi -agent reinforcement learning (MARL) algorithms over networked multi -agent systems. In this paper, the sample efficiency problem is tackled by formally incorporating the entropy regularization into the distributed MARL algorithm design. Firstly, a new entropy -regularized MARL problem is formulated under the model of networked multi -agent Markov decision processes with observation -based policies and homogeneous agents, where the policy parameter sharing among the agents provably preserves the optimality. Secondly, an on -policy distributed actor-critic algorithm is proposed, where each agent shares its parameters of both the critic and actor for consensus update. Then, the convergence analysis of the proposed algorithm is provided based on the stochastic approximation theory under the assumption of linear function approximation of the critic. Furthermore, a practical off -policy version of the proposed algorithm is developed which possesses scalability, data efficiency and learning stability. Finally, the proposed distributed algorithm is compared against the solid baselines including two classic centralized training algorithms in the multi -agent particle environment, whose learning performance is empirically demonstrated through extensive simulation experiments. (c) 2024 Elsevier Ltd. All rights reserved.

引用

页数：13

共 50 条

[21] Toward Policy Explanations for Multi-Agent Reinforcement Learning
Boggess, Kayla
Kraus, Sarit
Feng, Lu
PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 109 - 115
[22] Uncertainty modified policy for multi-agent reinforcement learning
Zhao, Xinyu
Liu, Jianxiang
Wu, Faguo
Zhang, Xiao
Wang, Guojian
APPLIED INTELLIGENCE, 2024, 54 (22) : 12020 - 12034
[23] Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning
Wang, Siying
Chen, Wenyu
Hu, Jian
Hu, Siyue
Huang, Liwei
MATHEMATICS, 2022, 10 (15)
[24] Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning
Adamczyk, Jacob
Arriojas, Argenis
Tiomkin, Stas
Kulkarni, Rahul V.
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6658 - 6665
[25] Graphical Minimax Game and On-Policy Reinforcement Learning for Consensus of Leaderless Multi-Agent Systems
Dong, Wei
Wang, Chunyan
Li, Jinna
Wang, Jianan
2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 606 - 611
[26] FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning
Zhang, Tianhao
Li, Yueheng
Wang, Chen
Xie, Guangming
Lu, Zongqing
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[27] QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus plus Innovations
Kar, Soummya
Moura, Jose M. F.
Poor, H. Vincent
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (07) : 1848 - 1862
[28] Multi-Agent Deep Reinforcement Learning for Distributed Load Restoration
Linh Vu
Tuyen Vu
Thanh Long Vu
Srivastava, Anurag
IEEE TRANSACTIONS ON SMART GRID, 2024, 15 (02) : 1749 - 1760
[29] Distributed Inverse Constrained Reinforcement Learning for Multi-agent Systems
Liu, Shicheng
Zhu, Minghui
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[30] Multi-agent deep reinforcement learning strategy for distributed energy
Xi, Lei
Sun, Mengmeng
Zhou, Huan
Xu, Yanchun
Wu, Junnan
Li, Yanying
MEASUREMENT, 2021, 185

← 1 2 3 4 5 →