Multi-Agent Actor-Critic Multitask Reinforcement Learning based on GTD(1) with Consensus

被引:2
|
作者
Stankovic, Milo S. S. [1 ,2 ]
Beko, Marko [3 ,4 ]
Ilic, Nemanja [2 ,5 ]
Stankovic, Srdjan S. [6 ]
机构
[1] Univ Singidunum, Belgrade, Serbia
[2] Vlatacom Inst, Belgrade, Serbia
[3] Univ Lisbon, Inst Telecomunicacoes, Inst Super Tecn, Lisbon, Portugal
[4] Univ Lusofona Humanidades Tecnolo, COPELABS, Lisbon, Portugal
[5] Coll Appl Tech Sci, Krusevac, Serbia
[6] Univ Belgrade, Sch Elect Engn, Belgrade, Serbia
关键词
STOCHASTIC-APPROXIMATION;
D O I
10.1109/CDC51059.2022.9992951
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, a new distributed multi-agent off-policy Actor-Critic algorithm for collaborative multitask reinforcement learning is proposed. The Critic stage is based on the distributed gradient temporal difference algorithm GTD(1), while the Actor stage is derived from a predefined global criterion function and consists of a complementary consensus-based exact policy gradient algorithm. A proof that the Feller-Markov properties hold for the derived algorithm at the Actor stage is derived. The weak convergence of the algorithm to the set of stationary points of an attached ODE is proved under mild conditions using the two-time-scale stochastic approximation arguments. An experimental verification of the algorithm properties is given, demonstrating its high efficiency and practical applicability.
引用
收藏
页码:4591 / 4596
页数:6
相关论文
共 50 条
  • [1] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
    Diddigi, Raghuram Bharadwaj
    Reddy, D. Sai Koti
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    [J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
  • [2] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Prashant Trivedi
    Nandyala Hemachandra
    [J]. Dynamic Games and Applications, 2023, 13 : 25 - 55
  • [3] Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
    Christianos, Filippos
    Schafer, Lukas
    Albrecht, Stefano V.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Trivedi, Prashant
    Hemachandra, Nandyala
    [J]. DYNAMIC GAMES AND APPLICATIONS, 2023, 13 (01) : 25 - 55
  • [5] A multi-agent reinforcement learning using Actor-Critic methods
    Li, Chun-Gui
    Wang, Meng
    Yuan, Qing-Neng
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 878 - 882
  • [6] Distributed Multi-Agent Reinforcement Learning by Actor-Critic Method
    Heredia, Paulo C.
    Mou, Shaoshuai
    [J]. IFAC PAPERSONLINE, 2019, 52 (20): : 363 - 368
  • [7] Actor-Critic for Multi-Agent Reinforcement Learning with Self-Attention
    Zhao, Juan
    Zhu, Tong
    Xiao, Shuo
    Gao, Zongqian
    Sun, Hao
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (09)
  • [8] Structural relational inference actor-critic for multi-agent reinforcement learning
    Zhang, Xianjie
    Liu, Yu
    Xu, Xiujuan
    Huang, Qiong
    Mao, Hangyu
    Carie, Anil
    [J]. NEUROCOMPUTING, 2021, 459 : 383 - 394
  • [9] Multi-agent reinforcement learning by the actor-critic model with an attention interface
    Zhang, Lixiang
    Li, Jingchen
    Zhu, Yi'an
    Shi, Haobin
    Hwang, Kao-Shing
    [J]. NEUROCOMPUTING, 2022, 471 : 275 - 284
  • [10] Dynamic Spectrum Sharing Based on Federated Learning and Multi-Agent Actor-Critic Reinforcement Learning
    Yang, Tongtong
    Zhang, Wensheng
    Bo, Yulian
    Sun, Jian
    Wang, Cheng-Xiang
    [J]. 2023 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING, IWCMC, 2023, : 947 - 952