Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning

被引：0

作者：

Li, Weifan ^{[1
]}

Zhu, Yuanheng ^{[1
,2
]}

Zhao, Dongbin ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China

来源：

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年

基金：

中国国家自然科学基金;

关键词：

multi-agent; reinforcement learning; policy gradient;

D O I：

10.1109/IJCNN54540.2023.10191652

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate the integration of value-based and policy gradient methods in multi-agent reinforcement learning (MARL). The Individual-Global-Max (IGM) principle plays an important role in value-based MARL, as it ensures consistency between joint and local action values. IGM is difficult to guarantee in multi-agent policy gradient methods due to stochastic exploration and conflicting gradient directions. In this paper, we propose a novel multi-agent policy gradient algorithm called Advantage Constrained Proximal Policy Optimization (ACPPO). ACPPO calculates each agent's current local state-action advantage based on their advantage network and estimates the joint state-action advantage based on multi-agent advantage decomposition lemma. According to the consistency of the estimated joint-action advantage and local advantage, the coefficient of each agent constrains the joint-action advantage. ACPPO, unlike previous policy gradient MARL algorithms, does not require an additional sampled baseline to reduce variance or a sequential scheme to improve accuracy. The proposed method is evaluated using the continuous matrix game, the Starcraft Multi-Agent Challenge, and the Multi-Agent MuJoCo task. ACPPO outperforms baselines such as MAPPO, MADDPG, and HATRPO, according to the results.

引用

页数：8

共 50 条

[31] Learning Distributed Coordinated Policy in Catching Game with Multi-Agent Reinforcement Learning
Liu, Xiangyu
Tan, Ying
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[32] Trustable Policy Collaboration Scheme for Multi-Agent Stigmergic Reinforcement Learning
Xu, Xing
Li, Rongpeng
Zhao, Zhifeng
Zhang, Honggang
IEEE COMMUNICATIONS LETTERS, 2022, 26 (04) : 823 - 827
[33] Constructive Policy: Reinforcement Learning Approach for Connected Multi-Agent Systems
Raza, Sayyed Jaffar Ali
Lin, Mingjie
2019 IEEE 15TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2019, : 257 - 262
[34] Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning
Bighashdel, Ariyan
de Geus, Daan
Jancura, Pavol
Dubbelman, Gijs
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[35] QDAP: Downsizing adaptive policy for cooperative multi-agent reinforcement learning
Zhao, Zhitong
Zhang, Ya
Wang, Siying
Zhang, Fan
Zhang, Malu
Chen, Wenyu
KNOWLEDGE-BASED SYSTEMS, 2024, 294
[36] Policy evaluation for reinforcement learning over asynchronous multi-agent networks
Sha, Xingyu
Zhang, Jiaqi
You, Keyou
2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 5373 - 5378
[37] QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
Rehman, Hafiz Muhammad Raza Ur
On, Byung-Won
Ningombam, Devarani Devi
Yi, Sungwon
Choi, Gyu Sang
IEEE ACCESS, 2021, 9 : 129728 - 129741
[38] Policy distillation for efficient decentralized execution in multi-agent reinforcement learning
Pei, Yuhang
Ren, Tao
Zhang, Yuxiang
Sun, Zhipeng
Champeyrol, Matys
NEUROCOMPUTING, 2025, 626
[39] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation
Wang, Huimu
Qiu, Tenghai
Liu, Zhen
Pu, Zhiqiang
Yi, Jianqiang
Yuan, Wanmai
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[40] Multi-Agent Reinforcement Learning With Distributed Targeted Multi-Agent Communication
Xu, Chi
Zhang, Hui
Zhang, Ya
2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 2915 - 2920

← 1 2 3 4 5 →