Advantage Constrained Proximal Policy Optimization in Multi-Agent Reinforcement Learning

被引:0
|
作者
Li, Weifan [1 ]
Zhu, Yuanheng [1 ,2 ]
Zhao, Dongbin [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 101408, Peoples R China
基金
中国国家自然科学基金;
关键词
multi-agent; reinforcement learning; policy gradient;
D O I
10.1109/IJCNN54540.2023.10191652
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigate the integration of value-based and policy gradient methods in multi-agent reinforcement learning (MARL). The Individual-Global-Max (IGM) principle plays an important role in value-based MARL, as it ensures consistency between joint and local action values. IGM is difficult to guarantee in multi-agent policy gradient methods due to stochastic exploration and conflicting gradient directions. In this paper, we propose a novel multi-agent policy gradient algorithm called Advantage Constrained Proximal Policy Optimization (ACPPO). ACPPO calculates each agent's current local state-action advantage based on their advantage network and estimates the joint state-action advantage based on multi-agent advantage decomposition lemma. According to the consistency of the estimated joint-action advantage and local advantage, the coefficient of each agent constrains the joint-action advantage. ACPPO, unlike previous policy gradient MARL algorithms, does not require an additional sampled baseline to reduce variance or a sequential scheme to improve accuracy. The proposed method is evaluated using the continuous matrix game, the Starcraft Multi-Agent Challenge, and the Multi-Agent MuJoCo task. ACPPO outperforms baselines such as MAPPO, MADDPG, and HATRPO, according to the results.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Proximal Policy Optimization based Decentralized Networked Multi-Agent Reinforcement Learning
    Liu, Jinyi
    Li, Fangyu
    Wang, Jingjing
    Han, Honggui
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION, ICCA 2024, 2024, : 839 - 844
  • [2] Target localization using Multi-Agent Deep Reinforcement Learning with Proximal Policy Optimization
    Alagha, Ahmed
    Singh, Shakti
    Mizouni, Rabeb
    Bentahar, Jamal
    Otrok, Hadi
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 136 : 342 - 357
  • [3] DeCOM: Decomposed Policy for Constrained Cooperative Multi-Agent Reinforcement Learning
    Yang, Zhaoxing
    Jin, Haiming
    Ding, Rong
    You, Haoyi
    Fan, Guiyun
    Wang, Xinbing
    Zhou, Chenghu
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 10861 - 10870
  • [4] Multi-Agent Reinforcement Learning with Information-sharing Constrained Policy Optimization for Global Cost Environment
    Okawa, Yoshihiro
    Dan, Hayato
    Morita, Natsuki
    Ogawa, Masatoshi
    IFAC PAPERSONLINE, 2023, 56 (02): : 1558 - 1565
  • [5] Multi-Agent Reinforcement Learning with Common Policy for Antenna Tilt Optimization
    Mendo, Adriano
    Outes-Carnero, Jose
    Ng-Molina, Yak
    Ramiro-Moreno, Juan
    IAENG International Journal of Computer Science, 2023, 50 (03)
  • [6] Online optimization of traffic policy through multi-agent reinforcement learning
    Sasaki, Y
    Flann, NS
    PROCEEDINGS OF THE 7TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2003, : 1211 - 1214
  • [7] Reinforcement learning for multi-agent patrol policy
    Lab. of Complex Systems and Intelligence Sciences, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
    Proc. IEEE Int. Conf. Cognitive Informatics, ICCI, (530-535):
  • [8] TEAM POLICY LEARNING FOR MULTI-AGENT REINFORCEMENT LEARNING
    Cassano, Lucas
    Alghunaim, Sulaiman A.
    Sayed, Ali H.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3062 - 3066
  • [9] Multi-Agent Reinforcement Learning for Convex Optimization
    Morcos, Amir
    West, Aaron
    Maguire, Brian
    ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS III, 2021, 11746
  • [10] Toward Policy Explanations for Multi-Agent Reinforcement Learning
    Boggess, Kayla
    Kraus, Sarit
    Feng, Lu
    PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 109 - 115