Distributed entropy-regularized multi-agent reinforcement learning with policy consensus

被引:1
|
作者
Hu, Yifan [1 ]
Fu, Junjie [1 ]
Wen, Guanghui [1 ]
Lv, Yuezu [2 ]
Ren, Wei [3 ]
机构
[1] Southeast Univ, Sch Math, Nanjing 210096, Peoples R China
[2] Beijing Inst Technol, Adv Res Inst Multidisciplinary Sci, Beijing 100081, Peoples R China
[3] Univ Calif Riverside, Dept Elect & Comp Engn, Riverside, CA 92521 USA
关键词
Distributed actor-critic algorithm; Networked multi-agent system; Entropy regularization; Deep reinforcement learning; ALGORITHM; NETWORKS;
D O I
10.1016/j.automatica.2024.111652
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sample efficiency is a limiting factor for existing distributed multi -agent reinforcement learning (MARL) algorithms over networked multi -agent systems. In this paper, the sample efficiency problem is tackled by formally incorporating the entropy regularization into the distributed MARL algorithm design. Firstly, a new entropy -regularized MARL problem is formulated under the model of networked multi -agent Markov decision processes with observation -based policies and homogeneous agents, where the policy parameter sharing among the agents provably preserves the optimality. Secondly, an on -policy distributed actor-critic algorithm is proposed, where each agent shares its parameters of both the critic and actor for consensus update. Then, the convergence analysis of the proposed algorithm is provided based on the stochastic approximation theory under the assumption of linear function approximation of the critic. Furthermore, a practical off -policy version of the proposed algorithm is developed which possesses scalability, data efficiency and learning stability. Finally, the proposed distributed algorithm is compared against the solid baselines including two classic centralized training algorithms in the multi -agent particle environment, whose learning performance is empirically demonstrated through extensive simulation experiments. (c) 2024 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Maximum Entropy-Regularized Multi-Goal Reinforcement Learning
    Zhao, Rui
    Sun, Xudong
    Tresp, Volker
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [2] Learning Distributed Coordinated Policy in Catching Game with Multi-Agent Reinforcement Learning
    Liu, Xiangyu
    Tan, Ying
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [3] Consensus Learning for Cooperative Multi-Agent Reinforcement Learning
    Xu, Zhiwei
    Zhang, Bin
    Li, Dapeng
    Zhang, Zeren
    Zhou, Guangchong
    Chen, Hao
    Fan, Guoliang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 10, 2023, : 11726 - 11734
  • [4] Multi-Agent Reinforcement Learning With Distributed Targeted Multi-Agent Communication
    Xu, Chi
    Zhang, Hui
    Zhang, Ya
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 2915 - 2920
  • [5] Parallel and distributed multi-agent reinforcement learning
    Kaya, M
    Arslan, A
    PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, 2001, : 437 - 441
  • [6] Coding for Distributed Multi-Agent Reinforcement Learning
    Wang, Baoqian
    Xie, Junfei
    Atanasov, Nikolay
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10625 - 10631
  • [7] Distributed reinforcement learning in multi-agent networks
    Kar, Soummya
    Moura, Jose M. F.
    Poor, H. Vincent
    2013 IEEE 5TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2013), 2013, : 296 - +
  • [8] Reinforcement learning for multi-agent patrol policy
    Lab. of Complex Systems and Intelligence Sciences, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    Proc. IEEE Int. Conf. Cognitive Informatics, ICCI, (530-535):
  • [9] TEAM POLICY LEARNING FOR MULTI-AGENT REINFORCEMENT LEARNING
    Cassano, Lucas
    Alghunaim, Sulaiman A.
    Sayed, Ali H.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3062 - 3066
  • [10] Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning
    Zhao, Xiaoxiao
    Yi, Peng
    Li, Li
    CONTROL THEORY AND TECHNOLOGY, 2020, 18 (04) : 362 - 378