Multi-agent cooperation policy gradient method based on enhanced exploration for cooperative tasks

被引:2
|
作者
Zhao, Li-yang [1 ]
Chang, Tian-qing [1 ]
Zhang, Lei [1 ]
Zhang, Xin-lu [2 ]
Wang, Jiang-feng [1 ]
机构
[1] Army Acad Armored Forces, Dept Weaponry & Control, Beijing 100072, Peoples R China
[2] China Huayin Ordnance Test Ctr, Huayin 714200, Peoples R China
关键词
Multi-agent deep reinforcement learning; Exploration; Intrinsic reward; Self-imitation learning; Value function input; NEURAL-NETWORKS; GAME; GO;
D O I
10.1007/s13042-023-01976-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-agent cooperation and coordination are often essential for task fulfillment. Multi-agent deep reinforcement learning (MADRL) can effectively learn the solutions to problems, but its application is still primarily restricted by the exploration-exploitation trade-off. Therefore, the focus of MADRL research is placed on how to effectively explore the environment and collect good experience with rich information to strengthen cooperative behaviors and optimize policy learning. To address this problem, we propose a novel multi-agent cooperation policy gradient method called multi-agent proximal policy optimization based on self-imitation learning and random network distillation (MAPPOSR). MAPPOSR consists of two policy gradient-based additional components, namely (1) random network distillation (RND) exploration bonus component that produces intrinsic rewards and encourages agents to access new states and actions, thereby helping them explore better trajectories and avoiding the algorithm prematurely converging or getting stuck in local optima; and (2) self-imitation learning (SIL) policy update component that stores and reuses high-return trajectory samples generated by agents themselves, thereby strengthening their cooperation and boosting learning efficiency. The experimental results show that in addition to effectively solving the hard-exploration problem, the proposed method significantly outperforms other SOTA MADRL algorithms in learning efficiency as well as in escaping local optima. Moreover, the effect of different function inputs on algorithm performance is investigated in the centralized training and decentralized execution (CTDE) framework, based on which a joint-observation coding method based on individual is developed. By encouraging the agent to focus more on the local observation information of other agents related to it and abandon global state information provided by the environment, the developed coding method can remove the effects of excessive value function input dimensions and redundant feature information on algorithm performance.
引用
收藏
页码:1431 / 1452
页数:22
相关论文
共 50 条
  • [1] Multi-agent cooperation policy gradient method based on enhanced exploration for cooperative tasks
    Li-yang Zhao
    Tian-qing Chang
    Lei Zhang
    Xin-lu Zhang
    Jiang-feng Wang
    International Journal of Machine Learning and Cybernetics, 2024, 15 : 1431 - 1452
  • [2] Cooperative Multi-agent Policy Gradient
    Bono, Guillaume
    Dibangoye, Jilles Steeve
    Matignon, Laetitia
    Pereyron, Florian
    Simonin, Olivier
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I, 2019, 11051 : 459 - 476
  • [3] A Deep Reinforcement Learning Method based on Deterministic Policy Gradient for Multi-Agent Cooperative Competition
    Zuo, Xuan
    Xue, Hui-Feng
    Wang, Xiao-Yin
    Du, Wan-Ru
    Tian, Tao
    Gao, Shan
    Zhang, Pu
    CONTROL ENGINEERING AND APPLIED INFORMATICS, 2021, 23 (03): : 88 - 98
  • [4] TAPE: Leveraging Agent Topology for Cooperative Multi-Agent Policy Gradient
    Lou, Xingzhou
    Zhang, Junge
    Norman, Timothy J.
    Huang, Kaiqi
    Du, Yali
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17496 - 17504
  • [5] WRFMR: A Multi-Agent Reinforcement Learning Method for Cooperative Tasks
    Liu, Hui
    Zhang, Zhen
    Wang, Dongqing
    IEEE ACCESS, 2020, 8 : 216320 - 216331
  • [6] Decentralized reinforcement social learning based on cooperative policy exploration in multi-agent systems
    Wang, Chi
    Chen, Xin
    2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1575 - 1580
  • [7] Method of Formation Cooperative Air Defense Decision Based on Multi-agent System Cooperation
    Wang, Chao
    Wang, Bo
    Zhang, Guo
    Liang, Yizhi
    COMMUNICATIONS AND INFORMATION PROCESSING, PT 2, 2012, 289 : 529 - 538
  • [8] Multi-UAV Cooperative Autonomous Navigation Based on Multi-agent Deep Deterministic Policy Gradient
    Li B.
    Yue K.-Q.
    Gan Z.-G.
    Gao P.-X.
    Yuhang Xuebao/Journal of Astronautics, 2021, 42 (06): : 757 - 765
  • [9] Multi-Agent Grouping and Routing with Cooperative Tasks
    Choi, Euihyeon
    Ahn, Jaemyung
    JOURNAL OF AEROSPACE INFORMATION SYSTEMS, 2025, 22 (03): : 163 - 176
  • [10] Goal Consistency: An Effective Multi-Agent Cooperative Method for Multistage Tasks
    Chen, Xinning
    Liu, Xuan
    Zhang, Shigeng
    Ding, Bo
    Li, Kenli
    PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 172 - 178