SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multiagent Reinforcement Learning

被引:13
|
作者
Yao, Xinghu [1 ]
Wen, Chao [1 ]
Wang, Yuhui [1 ]
Tan, Xiaoyang [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, MIIT Key Lab Pattern Anal & Machine Intelligence, Nanjing 211106, Peoples R China
基金
美国国家科学基金会;
关键词
Training; Optimization; Reinforcement learning; Nash equilibrium; Task analysis; History; Learning systems; Deep reinforcement learning (DRL); multiagent reinforcement learning (MARL); multiagent systems; StarCraft Multiagent Challenge (SMAC); CONSENSUS; SYSTEMS;
D O I
10.1109/TNNLS.2021.3089493
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning a stable and generalizable centralized value function (CVF) is a crucial but challenging task in multiagent reinforcement learning (MARL), as it has to deal with the issue that the joint action space increases exponentially with the number of agents in such scenarios. This article proposes an approach, named SMIX(lambda), that uses an off-policy training to achieve this by avoiding the greedy assumption commonly made in CVF learning. As importance sampling for such off-policy training is both computationally costly and numerically unstable, we proposed to use the lambda-return as a proxy to compute the temporal difference (TD) error. With this new loss function objective, we adopt a modified QMIX network structure as the base to train our model. By further connecting it with the Q(lambda) approach from a unified expectation correction viewpoint, we show that the proposed SMIX(lambda) is equivalent to Q(lambda) and hence shares its convergence properties, while without being suffered from the aforementioned curse of dimensionality problem inherent in MARL. Experiments on the StarCraft Multiagent Challenge (SMAC) benchmark demonstrate that our approach not only outperforms several state-of-the-art MARL methods by a large margin but also can be used as a general tool to improve the overall performance of other centralized training with decentralized execution (CTDE)-type algorithms by enhancing their CVFs.
引用
收藏
页码:52 / 63
页数:12
相关论文
共 50 条
  • [1] SMIX(λ): Enhancing Centralized Value Functions for Cooperative Multi-Agent Reinforcement Learning
    Wen, Chao
    Yao, Xinghu
    Wang, Yuhui
    Tan, Xiaoyang
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 7301 - 7308
  • [2] Multiagent Reinforcement Learning With Unshared Value Functions
    Hu, Yujing
    Gao, Yang
    An, Bo
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (04) : 647 - 662
  • [3] Acceleration methods for centralized multiagent reinforcement learning
    Akahane T.
    Iima H.
    [J]. IEEJ Transactions on Electronics, Information and Systems, 2020, 140 (02) : 242 - 248
  • [4] Learning to Teach in Cooperative Multiagent Reinforcement Learning
    Omidshafiei, Shayegan
    Kim, Dong-Ki
    Liu, Miao
    Tesauro, Gerald
    Riemer, Matthew
    Amato, Christopher
    Campbell, Murray
    How, Jonathan P.
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6128 - 6136
  • [5] Learning Cooperative Behaviours in Multiagent Reinforcement Learning
    Phon-Amnuaisuk, Somnuk
    [J]. NEURAL INFORMATION PROCESSING, PT 1, PROCEEDINGS, 2009, 5863 : 570 - 579
  • [6] Multiagent reinforcement learning through merging individually learned value functions
    张化祥
    黄上腾
    [J]. Journal of Harbin Institute of Technology(New series), 2005, (03) : 346 - 350
  • [7] Cooperative Multiagent Reinforcement Learning With Partial Observations
    Zhang, Yan
    Zavlanos, Michael M.
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (02) : 968 - 981
  • [8] The dynamics of reinforcement learning in cooperative multiagent systems
    Claus, C
    Boutilier, C
    [J]. FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 746 - 752
  • [9] CTDS: Centralized Teacher With Decentralized Student for Multiagent Reinforcement Learning
    Zhao, Jian
    Hu, Xunhan
    Yang, Mingyu
    Zhou, Wengang
    Zhu, Jiangcheng
    Li, Houqiang
    [J]. IEEE TRANSACTIONS ON GAMES, 2024, 16 (01) : 140 - 150
  • [10] Multiagent Reinforcement Social Learning toward Coordination in Cooperative Multiagent Systems
    Hao, Jianye
    Leung, Ho-Fung
    Ming, Zhong
    [J]. ACM TRANSACTIONS ON AUTONOMOUS AND ADAPTIVE SYSTEMS, 2015, 9 (04)