Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning

被引：0

作者：

Li, Weifan ^{[1
]}

Zhu, Yuanheng ^{[1
,2
]}

Zhao, Dongbin ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China

来源：

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年

基金：

中国国家自然科学基金;

关键词：

multi-agent; reinforcement learning; policy gradient;

D O I：

10.1109/IJCNN54540.2023.10191652

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate the integration of value-based and policy gradient methods in multi-agent reinforcement learning (MARL). The Individual-Global-Max (IGM) principle plays an important role in value-based MARL, as it ensures consistency between joint and local action values. IGM is difficult to guarantee in multi-agent policy gradient methods due to stochastic exploration and conflicting gradient directions. In this paper, we propose a novel multi-agent policy gradient algorithm called Advantage Constrained Proximal Policy Optimization (ACPPO). ACPPO calculates each agent's current local state-action advantage based on their advantage network and estimates the joint state-action advantage based on multi-agent advantage decomposition lemma. According to the consistency of the estimated joint-action advantage and local advantage, the coefficient of each agent constrains the joint-action advantage. ACPPO, unlike previous policy gradient MARL algorithms, does not require an additional sampled baseline to reduce variance or a sequential scheme to improve accuracy. The proposed method is evaluated using the continuous matrix game, the Starcraft Multi-Agent Challenge, and the Multi-Agent MuJoCo task. ACPPO outperforms baselines such as MAPPO, MADDPG, and HATRPO, according to the results.

引用

页数：8

共 50 条

[21] Multi-Agent First Order Constrained Optimization in Policy Space
Zhao, Youpeng
Yang, Yaodong
Lu, Zhenbo
Zhou, Wengang
Li, Houqiang
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[22] Multi-Agent Reinforcement Learning
Stankovic, Milos
2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43
[23] Certified Policy Smoothing for Cooperative Multi-Agent Reinforcement Learning
Mu, Ronghui
Ruan, Wenjie
Marcolino, Leandro Soriano
Jin, Gaojie
Ni, Qiang
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15046 - 15054
[24] MABQN: Multi-agent reinforcement learning algorithm with discrete policy
Xie, Qing
Wang, Zicheng
Fang, Yuyuan
Li, Yukai
NEUROCOMPUTING, 2025, 626
[25] Energy Constrained Multi-Agent Reinforcement Learning for Coverage Path Planning
Zhao, Chenyang
Liu, Juan
Yoon, Suk-Un
Li, Xinde
Li, Heqing
Zhang, Zhentong
2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5590 - 5597
[26] Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning
Diddigi, Raghuram Bharadwaj
Reddy, D. Sai Koti
Prabuchandran, K. J.
Bhatnagar, Shalabh
AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1931 - 1933
[27] Learning without Gradients: Multi-Agent Reinforcement Learning approach to optimization
Morcos, Amir
Man, Hong
West, Aaron
Maguire, Brian
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING IN DEFENSE APPLICATIONS IV, 2022, 12276
[28] Improving Proximal Policy Optimization Algorithm in Interactive Multi-agent Systems
Shang, Yi
Chen, Yifei
Cruz, Francisco
2024 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, ICDL 2024, 2024,
[29] Meta Proximal Policy Optimization for Cooperative Multi-Agent Continuous Control
Fang, Boli
Peng, Zhenghao
Sun, Hao
Zhang, Qin
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[30] MULTI-MODEL FEDERATED LEARNING OPTIMIZATION BASED ON MULTI-AGENT REINFORCEMENT LEARNING
Atapour, S. Kaveh
Seyedmohammadi, S. Jamal
Sheikholeslami, S. Mohammad
Abouei, Jamshid
Mohammadi, Arash
Plataniotis, Konstantinos N.
2023 IEEE 9TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING, CAMSAP, 2023, : 151 - 155

← 1 2 3 4 5 →