A Multi-Agent Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning

被引：17

作者：

Suttle, Wesley ^{[1
]}

Yang, Zhuoran ^{[2
]}

Zhang, Kaiqing ^{[3
]}

Wang, Zhaoran ^{[4
]}

Basar, Tamer ^{[3
]}

Liu, Ji ^{[5
]}

机构：

[1] SUNY Stony Brook, Dept Appl Math & Stat, Stony Brook, NY 11794 USA

[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA

[3] Univ Illinois, Coordinated Sci Lab, Champaign, IL USA

[4] Northwestern Univ, Dept Ind Engn & Management Sci, Evanston, IL 60208 USA

[5] SUNY Stony Brook, Dept Elect & Comp Engn, Stony Brook, NY 11794 USA

来源：

IFAC PAPERSONLINE | 2020年 / 53卷 / 02期

基金：

澳大利亚研究理事会;

关键词：

consensus and reinforcement learning control; adaptive control of multi-agent systems;

D O I：

10.1016/j.ifacol.2020.12.2021

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper extends off-policy reinforcement learning to the multi-agent case in which a set of networked agents communicating with their neighbors according to a time-varying graph collaboratively evaluates and improves a target policy while following a distinct behavior policy. To this end, the paper develops a multi-agent version of emphatic temporal difference learning for off-policy policy evaluation, and proves convergence under linear function approximation. The paper then leverages this result, in conjunction with a novel multi-agent off-policy policy gradient theorem and recent work in both multi-agent on-policy and single-agent off-policy actor-critic methods, to develop and give convergence guarantees for a new multi-agent off-policy actor-critic algorithm. An empirical validation of these theoretical results is given. Copyright (C) 2020 The Authors.

引用

页码：1549 / 1554

页数：6

共 50 条

[1] Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning
Stankovic, Milos S.
Beko, Marko
Ilic, Nemanja
Stankovic, Srdjan S.
[J]. EUROPEAN JOURNAL OF CONTROL, 2023, 74
[2] Multi-agent Gradient-Based Off-Policy Actor-Critic Algorithm for Distributed Reinforcement Learning
Ren, Jineng
[J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
[3] Distributed off-Policy Actor-Critic Reinforcement Learning with Policy Consensus
Zhang, Yan
Zavlanos, Michael M.
[J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 4674 - 4679
[4] Distributed Multi-Agent Reinforcement Learning by Actor-Critic Method
Heredia, Paulo C.
Mou, Shaoshuai
[J]. IFAC PAPERSONLINE, 2019, 52 (20): : 363 - 368
[5] A Communication-Efficient Multi-Agent Actor-Critic Algorithm for Distributed Reinforcement Learning
Lin, Yixuan
Zhang, Kaiqing
Yang, Zhuoran
Wang, Zhaoran
Basar, Tamer
Sandhu, Romeil
Liu, Ji
[J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5562 - 5567
[6] Toward Resilient Multi-Agent Actor-Critic Algorithms for Distributed Reinforcement Learning
Lin, Yixuan
Gade, Shripad
Sandhu, Romeil
Liu, Ji
[J]. 2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 3953 - 3958
[7] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
Diddigi, Raghuram Bharadwaj
Reddy, D. Sai Koti
Prabuchandran, K. J.
Bhatnagar, Shalabh
[J]. AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
[8] Multi-Agent Natural Actor-Critic Reinforcement Learning Algorithms
Prashant Trivedi
Nandyala Hemachandra
[J]. Dynamic Games and Applications, 2023, 13 : 25 - 55
[9] Shared Experience Actor-Critic for Multi-Agent Reinforcement Learning
Christianos, Filippos
Schafer, Lukas
Albrecht, Stefano V.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[10] Generalized Off-Policy Actor-Critic
Zhang, Shangtong
Boehmer, Wendelin
Whiteson, Shimon
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32

← 1 2 3 4 5 →