A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

被引:17
|
作者
Suttle, Wesley [1 ]
Yang, Zhuoran [2 ]
Zhang, Kaiqing [3 ]
Wang, Zhaoran [4 ]
Basar, Tamer [3 ]
Liu, Ji [5 ]
机构
[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA
[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
[3] Univ Illinois, Coordinated Sci Lab, Champaign, IL USA
[4] Northwestern Univ, Dept Ind Engn & Management Sci, Evanston, IL 60208 USA
[5] SUNY Stony Brook, Dept Elect & Comp Engn, Stony Brook, NY 11794 USA
来源
IFAC PAPERSONLINE | 2020年 / 53卷 / 02期
基金
澳大利亚研究理事会;
关键词
consensus and reinforcement learning control; adaptive control of multi-agent systems;
D O I
10.1016/j.ifacol.2020.12.2021
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy. To this end, the paper develops a multi-agent version of emphatic temporal difference learning for off-policy policy evaluation, and proves convergence under linear function approximation. The paper then leverages this result, in conjunction with a novel multi-agent off-policy policy gradient theorem and recent work in both multi-agent on-policy and single-agent off-policy actor-critic methods, to develop and give convergence guarantees for a new multi-agent off-policy actor-critic algorithm. An empirical validation of these theoretical results is given. Copyright (C) 2020 The Authors.
引用
收藏
页码:1549 / 1554
页数:6
相关论文
共 50 条
  • [1] Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning
    Stankovic, Milos S.
    Beko, Marko
    Ilic, Nemanja
    Stankovic, Srdjan S.
    [J]. EUROPEAN JOURNAL OF CONTROL, 2023, 74
  • [2] Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
    Ren, Jineng
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [3] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
    Zhang, Yan
    Zavlanos, Michael M.
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
  • [4] Distributed Multi-Agent Reinforcement Learning by Actor-Critic Method
    Heredia, Paulo C.
    Mou, Shaoshuai
    [J]. IFAC PAPERSONLINE, 2019, 52 (20): : 363 - 368
  • [5] A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning
    Lin, Yixuan
    Zhang, Kaiqing
    Yang, Zhuoran
    Wang, Zhaoran
    Basar, Tamer
    Sandhu, Romeil
    Liu, Ji
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5562 - 5567
  • [6] Toward Resilient Multi-Agent Actor-Critic Algorithms for Distributed Reinforcement Learning
    Lin, Yixuan
    Gade, Shripad
    Sandhu, Romeil
    Liu, Ji
    [J]. 2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 3953 - 3958
  • [7] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
    Diddigi, Raghuram Bharadwaj
    Reddy, D. Sai Koti
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    [J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
  • [8] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
    Prashant Trivedi
    Nandyala Hemachandra
    [J]. Dynamic Games and Applications, 2023, 13 : 25 - 55
  • [9] Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
    Christianos, Filippos
    Schafer, Lukas
    Albrecht, Stefano V.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [10] Generalized Off-Policy Actor-Critic
    Zhang, Shangtong
    Boehmer, Wendelin
    Whiteson, Shimon
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32