Cooperative Multiagent Reinforcement Learning With Partial Observations

被引:2
|
作者
Zhang, Yan [1 ]
Zavlanos, Michael M. [1 ]
机构
[1] Duke Univ, Dept Mech Engn & Mat Sci, Durham, NC 27708 USA
关键词
Optimization methods; Linear programming; Reinforcement learning; Task analysis; Convergence; Training; Stacking; Distributed zeroth-order optimization; multiagent reinforcement learning (MARL); partial observation; OPTIMIZATION; CONVEX;
D O I
10.1109/TAC.2023.3288025
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose a distributed zeroth-order policy optimization method for multiagent reinforcement learning (MARL). Existing MARL algorithms often assume that every agent can observe the states and actions of all the other agents in the network. This can be impractical in large-scale problems, where sharing the state and action information with multihop neighbors may incur significant communication overhead. The advantage of the proposed zeroth-order policy optimization method is that it allows the agents to compute the local policy gradients needed to update their local policy functions using local estimates of the global accumulated rewards that depend on partial state and action information only and can be obtained using consensus. Specifically, to calculate the local policy gradients, we develop a new distributed zeroth-order policy gradient estimator that relies on one-point residual-feedback which, compared to existing zeroth-order estimators that also rely on one-point feedback, significantly reduces the variance of the policy gradient estimates improving, in this way, the learning performance. We show that the proposed distributed zeroth-order policy optimization method with constant stepsize converges to the neighborhood of a policy that is a stationary point of the global objective function. The size of this neighborhood depends on the agents' learning rates, the exploration parameters, and the number of consensus steps used to calculate the local estimates of the global accumulated rewards. Moreover, we provide numerical experiments that demonstrate that our new zeroth-order policy gradient estimator is more sample-efficient compared to other existing one-point estimators.
引用
收藏
页码:968 / 981
页数:14
相关论文
共 50 条
  • [1] Learning Cooperative Behaviours in Multiagent Reinforcement Learning
    Phon-Amnuaisuk, Somnuk
    [J]. NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2009, 5863 : 570 - 579
  • [2] Learning to Teach in Cooperative Multiagent Reinforcement Learning
    Omidshafiei, Shayegan
    Kim, Dong-Ki
    Liu, Miao
    Tesauro, Gerald
    Riemer, Matthew
    Amato, Christopher
    Campbell, Murray
    How, Jonathan P.
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6128 - 6136
  • [3] The dynamics of reinforcement learning in cooperative multiagent systems
    Claus, C
    Boutilier, C
    [J]. FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 746 - 752
  • [4] Multiagent Reinforcement Social Learning toward Coordination in Cooperative Multiagent Systems
    Hao, Jianye
    Leung, Ho-Fung
    Ming, Zhong
    [J]. ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2015, 9 (04)
  • [5] Peer Incentive Reinforcement Learning for Cooperative Multiagent Games
    Zhang, Tianle
    Liu, Zhen
    Pu, Zhiqiang
    Yi, Jianqiang
    [J]. IEEE TRANSACTIONS ON GAMES, 2023, 15 (04) : 623 - 636
  • [6] UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios
    Chai, Jiajun
    Li, Weifan
    Zhu, Yuanheng
    Zhao, Dongbin
    Ma, Zhe
    Sun, Kewu
    Ding, Jishiyu
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (04) : 2093 - 2104
  • [7] A Multitier Reinforcement Learning Model for a Cooperative Multiagent System
    Shi, Haobin
    Zhai, Liangjing
    Wu, Haibo
    Hwang, Maxwell
    Hwang, Kao-Shing
    Hsu, Hsuan-Pei
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2020, 12 (03) : 636 - 644
  • [8] Reinforcement Learning With Task Decomposition for Cooperative Multiagent Systems
    Sun, Changyin
    Liu, Wenzhang
    Dong, Lu
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (05) : 2054 - 2065
  • [9] Cooperative Multiagent Reinforcement Learning Using Factor Graphs
    Zhang, Zhen
    Zhao, Dongbin
    [J]. PROCEEDINGS OF THE 2013 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2013, : 797 - 802
  • [10] Cooperative Partial Task Offloading and Resource Allocation for IIoT Based on Decentralized Multiagent Deep Reinforcement Learning
    Zhang, Fan
    Han, Guangjie
    Liu, Li
    Zhang, Yu
    Peng, Yan
    Li, Chao
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (03): : 5526 - 5544