Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning

被引:0
|
作者
Li, Weifan [1 ]
Zhu, Yuanheng [1 ,2 ]
Zhao, Dongbin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-agent; reinforcement learning; policy gradient;
D O I
10.1109/IJCNN54540.2023.10191652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the integration of value-based and policy gradient methods in multi-agent reinforcement learning (MARL). The Individual-Global-Max (IGM) principle plays an important role in value-based MARL, as it ensures consistency between joint and local action values. IGM is difficult to guarantee in multi-agent policy gradient methods due to stochastic exploration and conflicting gradient directions. In this paper, we propose a novel multi-agent policy gradient algorithm called Advantage Constrained Proximal Policy Optimization (ACPPO). ACPPO calculates each agent's current local state-action advantage based on their advantage network and estimates the joint state-action advantage based on multi-agent advantage decomposition lemma. According to the consistency of the estimated joint-action advantage and local advantage, the coefficient of each agent constrains the joint-action advantage. ACPPO, unlike previous policy gradient MARL algorithms, does not require an additional sampled baseline to reduce variance or a sequential scheme to improve accuracy. The proposed method is evaluated using the continuous matrix game, the Starcraft Multi-Agent Challenge, and the Multi-Agent MuJoCo task. ACPPO outperforms baselines such as MAPPO, MADDPG, and HATRPO, according to the results.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Multi-Agent First Order Constrained Optimization in Policy Space
    Zhao, Youpeng
    Yang, Yaodong
    Lu, Zhenbo
    Zhou, Wengang
    Li, Houqiang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [22] Multi-Agent Reinforcement Learning
    Stankovic, Milos
    2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43
  • [23] Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning
    Mu, Ronghui
    Ruan, Wenjie
    Marcolino, Leandro Soriano
    Jin, Gaojie
    Ni, Qiang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15046 - 15054
  • [24] MABQN: Multi-agent reinforcement learning algorithm with discrete policy
    Xie, Qing
    Wang, Zicheng
    Fang, Yuyuan
    Li, Yukai
    NEUROCOMPUTING, 2025, 626
  • [25] Energy Constrained Multi-Agent Reinforcement Learning for Coverage Path Planning
    Zhao, Chenyang
    Liu, Juan
    Yoon, Suk-Un
    Li, Xinde
    Li, Heqing
    Zhang, Zhentong
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5590 - 5597
  • [26] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
    Diddigi, Raghuram Bharadwaj
    Reddy, D. Sai Koti
    Prabuchandran, K. J.
    Bhatnagar, Shalabh
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
  • [27] Learning without Gradients: Multi-Agent Reinforcement Learning approach to optimization
    Morcos, Amir
    Man, Hong
    West, Aaron
    Maguire, Brian
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS IV, 2022, 12276
  • [28] Improving Proximal Policy Optimization Algorithm in Interactive Multi-agent Systems
    Shang, Yi
    Chen, Yifei
    Cruz, Francisco
    2024 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, ICDL 2024, 2024,
  • [29] Meta Proximal Policy Optimization for Cooperative Multi-Agent Continuous Control
    Fang, Boli
    Peng, Zhenghao
    Sun, Hao
    Zhang, Qin
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [30] MULTI-MODEL FEDERATED LEARNING OPTIMIZATION BASED ON MULTI-AGENT REINFORCEMENT LEARNING
    Atapour, S. Kaveh
    Seyedmohammadi, S. Jamal
    Sheikholeslami, S. Mohammad
    Abouei, Jamshid
    Mohammadi, Arash
    Plataniotis, Konstantinos N.
    2023 IEEE 9TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING, CAMSAP, 2023, : 151 - 155