Cooperative Multiagent Reinforcement Learning With Partial Observations

被引：2

作者：

Zhang, Yan ^{[1
]}

Zavlanos, Michael M. ^{[1
]}

机构：

[1] Duke Univ, Dept Mech Engn & Mat Sci, Durham, NC 27708 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2024年 / 69卷 / 02期

关键词：

Optimization methods; Linear programming; Reinforcement learning; Task analysis; Convergence; Training; Stacking; Distributed zeroth-order optimization; multiagent reinforcement learning (MARL); partial observation; OPTIMIZATION; CONVEX;

D O I：

10.1109/TAC.2023.3288025

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article, we propose a distributed zeroth-order policy optimization method for multiagent reinforcement learning (MARL). Existing MARL algorithms often assume that every agent can observe the states and actions of all the other agents in the network. This can be impractical in large-scale problems, where sharing the state and action information with multihop neighbors may incur significant communication overhead. The advantage of the proposed zeroth-order policy optimization method is that it allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards that depend on partial state and action information only and can be obtained using consensus. Specifically, to calculate the local policy gradients, we develop a new distributed zeroth-order policy gradient estimator that relies on one-point residual-feedback which, compared to existing zeroth-order estimators that also rely on one-point feedback, significantly reduces the variance of the policy gradient estimates improving, in this way, the learning performance. We show that the proposed distributed zeroth-order policy optimization method with constant stepsize converges to the neighborhood of a policy that is a stationary point of the global objective function. The size of this neighborhood depends on the agents' learning rates, the exploration parameters, and the number of consensus steps used to calculate the local estimates of the global accumulated rewards. Moreover, we provide numerical experiments that demonstrate that our new zeroth-order policy gradient estimator is more sample-efficient compared to other existing one-point estimators.

引用

页码：968 / 981

页数：14

共 50 条

[1] Learning Cooperative Behaviours in Multiagent Reinforcement Learning
Phon-Amnuaisuk, Somnuk
[J]. NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2009, 5863 : 570 - 579
[2] Learning to Teach in Cooperative Multiagent Reinforcement Learning
Omidshafiei, Shayegan
Kim, Dong-Ki
Liu, Miao
Tesauro, Gerald
Riemer, Matthew
Amato, Christopher
Campbell, Murray
How, Jonathan P.
[J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6128 - 6136
[3] The dynamics of reinforcement learning in cooperative multiagent systems
Claus, C
Boutilier, C
[J]. FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 746 - 752
[4] Multiagent Reinforcement Social Learning toward Coordination in Cooperative Multiagent Systems
Hao, Jianye
Leung, Ho-Fung
Ming, Zhong
[J]. ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2015, 9 (04)
[5] Peer Incentive Reinforcement Learning for Cooperative Multiagent Games
Zhang, Tianle
Liu, Zhen
Pu, Zhiqiang
Yi, Jianqiang
[J]. IEEE TRANSACTIONS ON GAMES, 2023, 15 (04) : 623 - 636
[6] UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios
Chai, Jiajun
Li, Weifan
Zhu, Yuanheng
Zhao, Dongbin
Ma, Zhe
Sun, Kewu
Ding, Jishiyu
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (04) : 2093 - 2104
[7] A Multitier Reinforcement Learning Model for a Cooperative Multiagent System
Shi, Haobin
Zhai, Liangjing
Wu, Haibo
Hwang, Maxwell
Hwang, Kao-Shing
Hsu, Hsuan-Pei
[J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2020, 12 (03) : 636 - 644
[8] Reinforcement Learning With Task Decomposition for Cooperative Multiagent Systems
Sun, Changyin
Liu, Wenzhang
Dong, Lu
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) : 2054 - 2065
[9] Cooperative Multiagent Reinforcement Learning Using Factor Graphs
Zhang, Zhen
Zhao, Dongbin
[J]. PROCEEDINGS OF THE 2013 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2013, : 797 - 802
[10] Cooperative Partial Task Offloading and Resource Allocation for IIoT Based on Decentralized Multiagent Deep Reinforcement Learning
Zhang, Fan
Han, Guangjie
Liu, Li
Zhang, Yu
Peng, Yan
Li, Chao
[J]. IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (03): : 5526 - 5544

← 1 2 3 4 5 →