Distributed entropy-regularized multi-agent reinforcement learning with policy consensus

被引:1
|
作者
Hu, Yifan [1 ]
Fu, Junjie [1 ]
Wen, Guanghui [1 ]
Lv, Yuezu [2 ]
Ren, Wei [3 ]
机构
[1] Southeast Univ, Sch Math, Nanjing 210096, Peoples R China
[2] Beijing Inst Technol, Adv Res Inst Multidisciplinary Sci, Beijing 100081, Peoples R China
[3] Univ Calif Riverside, Dept Elect & Comp Engn, Riverside, CA 92521 USA
关键词
Distributed actor-critic algorithm; Networked multi-agent system; Entropy regularization; Deep reinforcement learning; ALGORITHM; NETWORKS;
D O I
10.1016/j.automatica.2024.111652
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sample efficiency is a limiting factor for existing distributed multi -agent reinforcement learning (MARL) algorithms over networked multi -agent systems. In this paper, the sample efficiency problem is tackled by formally incorporating the entropy regularization into the distributed MARL algorithm design. Firstly, a new entropy -regularized MARL problem is formulated under the model of networked multi -agent Markov decision processes with observation -based policies and homogeneous agents, where the policy parameter sharing among the agents provably preserves the optimality. Secondly, an on -policy distributed actor-critic algorithm is proposed, where each agent shares its parameters of both the critic and actor for consensus update. Then, the convergence analysis of the proposed algorithm is provided based on the stochastic approximation theory under the assumption of linear function approximation of the critic. Furthermore, a practical off -policy version of the proposed algorithm is developed which possesses scalability, data efficiency and learning stability. Finally, the proposed distributed algorithm is compared against the solid baselines including two classic centralized training algorithms in the multi -agent particle environment, whose learning performance is empirically demonstrated through extensive simulation experiments. (c) 2024 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Toward Policy Explanations for Multi-Agent Reinforcement Learning
    Boggess, Kayla
    Kraus, Sarit
    Feng, Lu
    PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 109 - 115
  • [22] Uncertainty modified policy for multi-agent reinforcement learning
    Zhao, Xinyu
    Liu, Jianxiang
    Wu, Faguo
    Zhang, Xiao
    Wang, Guojian
    APPLIED INTELLIGENCE, 2024, 54 (22) : 12020 - 12034
  • [23] Noise-Regularized Advantage Value for Multi-Agent Reinforcement Learning
    Wang, Siying
    Chen, Wenyu
    Hu, Jian
    Hu, Siyue
    Huang, Liwei
    MATHEMATICS, 2022, 10 (15)
  • [24] Utilizing Prior Solutions for Reward Shaping and Composition in Entropy-Regularized Reinforcement Learning
    Adamczyk, Jacob
    Arriojas, Argenis
    Tiomkin, Stas
    Kulkarni, Rahul V.
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6658 - 6665
  • [25] Graphical Minimax Game and On-Policy Reinforcement Learning for Consensus of Leaderless Multi-Agent Systems
    Dong, Wei
    Wang, Chunyan
    Li, Jinna
    Wang, Jianan
    2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA), 2020, : 606 - 611
  • [26] FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning
    Zhang, Tianhao
    Li, Yueheng
    Wang, Chen
    Xie, Guangming
    Lu, Zongqing
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [27] QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus plus Innovations
    Kar, Soummya
    Moura, Jose M. F.
    Poor, H. Vincent
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (07) : 1848 - 1862
  • [28] Multi-Agent Deep Reinforcement Learning for Distributed Load Restoration
    Linh Vu
    Tuyen Vu
    Thanh Long Vu
    Srivastava, Anurag
    IEEE TRANSACTIONS ON SMART GRID, 2024, 15 (02) : 1749 - 1760
  • [29] Distributed Inverse Constrained Reinforcement Learning for Multi-agent Systems
    Liu, Shicheng
    Zhu, Minghui
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [30] Multi-agent deep reinforcement learning strategy for distributed energy
    Xi, Lei
    Sun, Mengmeng
    Zhou, Huan
    Xu, Yanchun
    Wu, Junnan
    Li, Yanying
    MEASUREMENT, 2021, 185