Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms

被引:2
|
作者
Trivedi, Prashant [1 ]
Hemachandra, Nandyala [1 ]
机构
[1] Indian Inst Technol, Ind Engn & Operat Res, Mumbai, Maharashtra, India
关键词
Natural Gradients; Actor-Critic Methods; Networked Agents; Traffic Network Control; Stochastic Approximations; Function Approximations; Fisher Information Matrix; Non-Convex Optimization; Quasi second-order methods; Local optima value comparison; Algorithms for better local minima; OPTIMIZATION; CONVERGENCE;
D O I
10.1007/s13235-022-00449-9
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Multi-agent actor-critic algorithms are an important part of the Reinforcement Learning (RL) paradigm. We propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms in this work. The objective is to collectively find a joint policy that maximizes the average long-term return of these agents. In the absence of a central controller and to preserve privacy, agents communicate some information to their neighbors via a time-varying communication network. We prove convergence of all the three MAN algorithms to a globally asymptotically stable set of the ODE corresponding to actor update; these use linear function approximations. We show that the Kullback-Leibler divergence between policies of successive iterates is proportional to the objective function's gradient. We observe that the minimum singular value of the Fisher information matrix is well within the reciprocal of the policy parameter dimension. Using this, we theoretically show that the optimal value of the deterministic variant of the MAN algorithm at each iterate dominates that of the standard gradient-based multi-agent actor-critic (MAAC) algorithm. To our knowledge, it is the first such result in multi-agent reinforcement learning (MARL). To illustrate the usefulness of our proposed algorithms, we implement them on a bi-lane traffic network to reduce the average network congestion. We observe an almost 25% reduction in the average congestion in 2 MAN algorithms; the average congestion in another MAN algorithm is on par with the MAAC algorithm. We also consider a generic 15 agent MARL; the performance of the MAN algorithms is again as good as the MAAC algorithm.
引用
收藏
页码:25 / 55
页数:31
相关论文
共 50 条
  • [1] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Prashant Trivedi
    Nandyala Hemachandra
    [J]. Dynamic Games and Applications, 2023, 13 : 25 - 55
  • [2] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
    Diddigi, Raghuram Bharadwaj
    Reddy, D. Sai Koti
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    [J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
  • [3] Toward Resilient Multi-Agent Actor-Critic Algorithms for Distributed Reinforcement Learning
    Lin, Yixuan
    Gade, Shripad
    Sandhu, Romeil
    Liu, Ji
    [J]. 2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 3953 - 3958
  • [4] Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
    Christianos, Filippos
    Schafer, Lukas
    Albrecht, Stefano V.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [5] A multi-agent reinforcement learning using Actor-Critic methods
    Li, Chun-Gui
    Wang, Meng
    Yuan, Qing-Neng
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 878 - 882
  • [6] Distributed Multi-Agent Reinforcement Learning by Actor-Critic Method
    Heredia, Paulo C.
    Mou, Shaoshuai
    [J]. IFAC PAPERSONLINE, 2019, 52 (20): : 363 - 368
  • [7] Actor-Critic for Multi-Agent Reinforcement Learning with Self-Attention
    Zhao, Juan
    Zhu, Tong
    Xiao, Shuo
    Gao, Zongqian
    Sun, Hao
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (09)
  • [8] Multi-agent reinforcement learning by the actor-critic model with an attention interface
    Zhang, Lixiang
    Li, Jingchen
    Zhu, Yi'an
    Shi, Haobin
    Hwang, Kao-Shing
    [J]. NEUROCOMPUTING, 2022, 471 : 275 - 284
  • [9] Structural relational inference actor-critic for multi-agent reinforcement learning
    Zhang, Xianjie
    Liu, Yu
    Xu, Xiujuan
    Huang, Qiong
    Mao, Hangyu
    Carie, Anil
    [J]. NEUROCOMPUTING, 2021, 459 : 383 - 394
  • [10] Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning
    Xiao, Yuchen
    Lyu, Xueguang
    Amato, Christopher
    [J]. 2021 INTERNATIONAL SYMPOSIUM ON MULTI-ROBOT AND MULTI-AGENT SYSTEMS (MRS), 2021, : 155 - 163