Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning

被引:0
|
作者
Li, Weifan [1 ]
Zhu, Yuanheng [1 ,2 ]
Zhao, Dongbin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-agent; reinforcement learning; policy gradient;
D O I
10.1109/IJCNN54540.2023.10191652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the integration of value-based and policy gradient methods in multi-agent reinforcement learning (MARL). The Individual-Global-Max (IGM) principle plays an important role in value-based MARL, as it ensures consistency between joint and local action values. IGM is difficult to guarantee in multi-agent policy gradient methods due to stochastic exploration and conflicting gradient directions. In this paper, we propose a novel multi-agent policy gradient algorithm called Advantage Constrained Proximal Policy Optimization (ACPPO). ACPPO calculates each agent's current local state-action advantage based on their advantage network and estimates the joint state-action advantage based on multi-agent advantage decomposition lemma. According to the consistency of the estimated joint-action advantage and local advantage, the coefficient of each agent constrains the joint-action advantage. ACPPO, unlike previous policy gradient MARL algorithms, does not require an additional sampled baseline to reduce variance or a sequential scheme to improve accuracy. The proposed method is evaluated using the continuous matrix game, the Starcraft Multi-Agent Challenge, and the Multi-Agent MuJoCo task. ACPPO outperforms baselines such as MAPPO, MADDPG, and HATRPO, according to the results.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Learning Distributed Coordinated Policy in Catching Game with Multi-Agent Reinforcement Learning
    Liu, Xiangyu
    Tan, Ying
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [32] Trustable Policy Collaboration Scheme for Multi-Agent Stigmergic Reinforcement Learning
    Xu, Xing
    Li, Rongpeng
    Zhao, Zhifeng
    Zhang, Honggang
    IEEE COMMUNICATIONS LETTERS, 2022, 26 (04) : 823 - 827
  • [33] Constructive Policy: Reinforcement Learning Approach for Connected Multi-Agent Systems
    Raza, Sayyed Jaffar Ali
    Lin, Mingjie
    2019 IEEE 15TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2019, : 257 - 262
  • [34] Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning
    Bighashdel, Ariyan
    de Geus, Daan
    Jancura, Pavol
    Dubbelman, Gijs
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [35] QDAP: Downsizing adaptive policy for cooperative multi-agent reinforcement learning
    Zhao, Zhitong
    Zhang, Ya
    Wang, Siying
    Zhang, Fan
    Zhang, Malu
    Chen, Wenyu
    KNOWLEDGE-BASED SYSTEMS, 2024, 294
  • [36] Policy evaluation for reinforcement learning over asynchronous multi-agent networks
    Sha, Xingyu
    Zhang, Jiaqi
    You, Keyou
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 5373 - 5378
  • [37] QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
    Rehman, Hafiz Muhammad Raza Ur
    On, Byung-Won
    Ningombam, Devarani Devi
    Yi, Sungwon
    Choi, Gyu Sang
    IEEE ACCESS, 2021, 9 : 129728 - 129741
  • [38] Policy distillation for efficient decentralized execution in multi-agent reinforcement learning
    Pei, Yuhang
    Ren, Tao
    Zhang, Yuxiang
    Sun, Zhipeng
    Champeyrol, Matys
    NEUROCOMPUTING, 2025, 626
  • [39] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation
    Wang, Huimu
    Qiu, Tenghai
    Liu, Zhen
    Pu, Zhiqiang
    Yi, Jianqiang
    Yuan, Wanmai
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [40] Multi-Agent Reinforcement Learning With Distributed Targeted Multi-Agent Communication
    Xu, Chi
    Zhang, Hui
    Zhang, Ya
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 2915 - 2920