Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning

被引:0
|
作者
Li, Weifan [1 ]
Zhu, Yuanheng [1 ,2 ]
Zhao, Dongbin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-agent; reinforcement learning; policy gradient;
D O I
10.1109/IJCNN54540.2023.10191652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the integration of value-based and policy gradient methods in multi-agent reinforcement learning (MARL). The Individual-Global-Max (IGM) principle plays an important role in value-based MARL, as it ensures consistency between joint and local action values. IGM is difficult to guarantee in multi-agent policy gradient methods due to stochastic exploration and conflicting gradient directions. In this paper, we propose a novel multi-agent policy gradient algorithm called Advantage Constrained Proximal Policy Optimization (ACPPO). ACPPO calculates each agent's current local state-action advantage based on their advantage network and estimates the joint state-action advantage based on multi-agent advantage decomposition lemma. According to the consistency of the estimated joint-action advantage and local advantage, the coefficient of each agent constrains the joint-action advantage. ACPPO, unlike previous policy gradient MARL algorithms, does not require an additional sampled baseline to reduce variance or a sequential scheme to improve accuracy. The proposed method is evaluated using the continuous matrix game, the Starcraft Multi-Agent Challenge, and the Multi-Agent MuJoCo task. ACPPO outperforms baselines such as MAPPO, MADDPG, and HATRPO, according to the results.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Multi-Agent Uncertainty Sharing for Cooperative Multi-Agent Reinforcement Learning
    Chen, Hao
    Yang, Guangkai
    Zhang, Junge
    Yin, Qiyue
    Huang, Kaiqi
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [42] Flexible RAN Slicing in Open RAN With Constrained Multi-Agent Reinforcement Learning
    Zangooei, Mohammad
    Golkarifard, Morteza
    Rouili, Mohamed
    Saha, Niloy
    Boutaba, Raouf
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2024, 42 (02) : 280 - 294
  • [43] Hierarchical multi-agent reinforcement learning
    Mohammad Ghavamzadeh
    Sridhar Mahadevan
    Rajbala Makar
    Autonomous Agents and Multi-Agent Systems, 2006, 13 : 197 - 229
  • [44] Learning to Share in Multi-Agent Reinforcement Learning
    Yi, Yuxuan
    Li, Ge
    Wang, Yaowei
    Lu, Zongqing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [45] Multi-Agent Reinforcement Learning for Microgrids
    Dimeas, A. L.
    Hatziargyriou, N. D.
    IEEE POWER AND ENERGY SOCIETY GENERAL MEETING 2010, 2010,
  • [46] Hierarchical multi-agent reinforcement learning
    Ghavamzadeh, Mohammad
    Mahadevan, Sridhar
    Makar, Rajbala
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2006, 13 (02) : 197 - 229
  • [47] Multi-agent Exploration with Reinforcement Learning
    Sygkounas, Alkis
    Tsipianitis, Dimitris
    Nikolakopoulos, George
    Bechlioulis, Charalampos P.
    2022 30TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2022, : 630 - 635
  • [48] Partitioning in multi-agent reinforcement learning
    Sun, R
    Peterson, T
    FROM ANIMALS TO ANIMATS 6, 2000, : 325 - 332
  • [49] The Dynamics of Multi-Agent Reinforcement Learning
    Dickens, Luke
    Broda, Krysia
    Russo, Alessandra
    ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 367 - 372
  • [50] Multi-agent reinforcement learning: A survey
    Busoniu, Lucian
    Babuska, Robert
    De Schutter, Bart
    2006 9TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION, VOLS 1- 5, 2006, : 1133 - +