Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning

被引：0

作者：

Li, Weifan ^{[1
]}

Zhu, Yuanheng ^{[1
,2
]}

Zhao, Dongbin ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China

来源：

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年

基金：

中国国家自然科学基金;

关键词：

multi-agent; reinforcement learning; policy gradient;

D O I：

10.1109/IJCNN54540.2023.10191652

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We investigate the integration of value-based and policy gradient methods in multi-agent reinforcement learning (MARL). The Individual-Global-Max (IGM) principle plays an important role in value-based MARL, as it ensures consistency between joint and local action values. IGM is difficult to guarantee in multi-agent policy gradient methods due to stochastic exploration and conflicting gradient directions. In this paper, we propose a novel multi-agent policy gradient algorithm called Advantage Constrained Proximal Policy Optimization (ACPPO). ACPPO calculates each agent's current local state-action advantage based on their advantage network and estimates the joint state-action advantage based on multi-agent advantage decomposition lemma. According to the consistency of the estimated joint-action advantage and local advantage, the coefficient of each agent constrains the joint-action advantage. ACPPO, unlike previous policy gradient MARL algorithms, does not require an additional sampled baseline to reduce variance or a sequential scheme to improve accuracy. The proposed method is evaluated using the continuous matrix game, the Starcraft Multi-Agent Challenge, and the Multi-Agent MuJoCo task. ACPPO outperforms baselines such as MAPPO, MADDPG, and HATRPO, according to the results.

引用

页数：8

共 50 条

[1] Proximal Policy Optimization based Decentralized Networked Multi-Agent Reinforcement Learning
Liu, Jinyi
Li, Fangyu
Wang, Jingjing
Han, Honggui
2024 IEEE 18TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION, ICCA 2024, 2024, : 839 - 844
[2] Target localization using Multi-Agent Deep Reinforcement Learning with Proximal Policy Optimization
Alagha, Ahmed
Singh, Shakti
Mizouni, Rabeb
Bentahar, Jamal
Otrok, Hadi
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 136 : 342 - 357
[3] DeCOM: Decomposed Policy for Constrained Cooperative Multi-Agent Reinforcement Learning
Yang, Zhaoxing
Jin, Haiming
Ding, Rong
You, Haoyi
Fan, Guiyun
Wang, Xinbing
Zhou, Chenghu
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10861 - 10870
[4] Multi-Agent Reinforcement Learning with Information-sharing Constrained Policy Optimization for Global Cost Environment
Okawa, Yoshihiro
Dan, Hayato
Morita, Natsuki
Ogawa, Masatoshi
IFAC PAPERSONLINE, 2023, 56 (02): : 1558 - 1565
[5] Multi-Agent Reinforcement Learning with Common Policy for Antenna Tilt Optimization
Mendo, Adriano
Outes-Carnero, Jose
Ng-Molina, Yak
Ramiro-Moreno, Juan
IAENG International Journal of Computer Science, 2023, 50 (03)
[6] Online optimization of traffic policy through multi-agent reinforcement learning
Sasaki, Y
Flann, NS
PROCEEDINGS OF THE 7TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2003, : 1211 - 1214
[7] Reinforcement learning for multi-agent patrol policy
Lab. of Complex Systems and Intelligence Sciences, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Proc. IEEE Int. Conf. Cognitive Informatics, ICCI, (530-535):
[8] TEAM POLICY LEARNING FOR MULTI-AGENT REINFORCEMENT LEARNING
Cassano, Lucas
Alghunaim, Sulaiman A.
Sayed, Ali H.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3062 - 3066
[9] Multi-Agent Reinforcement Learning for Convex Optimization
Morcos, Amir
West, Aaron
Maguire, Brian
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS III, 2021, 11746
[10] Toward Policy Explanations for Multi-Agent Reinforcement Learning
Boggess, Kayla
Kraus, Sarit
Feng, Lu
PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 109 - 115

← 1 2 3 4 5 →