Multi-agent cooperation policy gradient method based on enhanced exploration for cooperative tasks

被引:2
|
作者
Zhao, Li-yang [1 ]
Chang, Tian-qing [1 ]
Zhang, Lei [1 ]
Zhang, Xin-lu [2 ]
Wang, Jiang-feng [1 ]
机构
[1] Army Acad Armored Forces, Dept Weaponry & Control, Beijing 100072, Peoples R China
[2] China Huayin Ordnance Test Ctr, Huayin 714200, Peoples R China
关键词
Multi-agent deep reinforcement learning; Exploration; Intrinsic reward; Self-imitation learning; Value function input; NEURAL-NETWORKS; GAME; GO;
D O I
10.1007/s13042-023-01976-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-agent cooperation and coordination are often essential for task fulfillment. Multi-agent deep reinforcement learning (MADRL) can effectively learn the solutions to problems, but its application is still primarily restricted by the exploration-exploitation trade-off. Therefore, the focus of MADRL research is placed on how to effectively explore the environment and collect good experience with rich information to strengthen cooperative behaviors and optimize policy learning. To address this problem, we propose a novel multi-agent cooperation policy gradient method called multi-agent proximal policy optimization based on self-imitation learning and random network distillation (MAPPOSR). MAPPOSR consists of two policy gradient-based additional components, namely (1) random network distillation (RND) exploration bonus component that produces intrinsic rewards and encourages agents to access new states and actions, thereby helping them explore better trajectories and avoiding the algorithm prematurely converging or getting stuck in local optima; and (2) self-imitation learning (SIL) policy update component that stores and reuses high-return trajectory samples generated by agents themselves, thereby strengthening their cooperation and boosting learning efficiency. The experimental results show that in addition to effectively solving the hard-exploration problem, the proposed method significantly outperforms other SOTA MADRL algorithms in learning efficiency as well as in escaping local optima. Moreover, the effect of different function inputs on algorithm performance is investigated in the centralized training and decentralized execution (CTDE) framework, based on which a joint-observation coding method based on individual is developed. By encouraging the agent to focus more on the local observation information of other agents related to it and abandon global state information provided by the environment, the developed coding method can remove the effects of excessive value function input dimensions and redundant feature information on algorithm performance.
引用
收藏
页码:1431 / 1452
页数:22
相关论文
共 50 条
  • [31] Dynamic subtask representation and assignment in cooperative multi-agent tasks
    You, Chenlong
    Wu, Yingbo
    Cai, Junpeng
    Luo, Qi
    Zhou, Yanbing
    NEUROCOMPUTING, 2025, 628
  • [32] A Projection-based Exploration Method for Multi-Agent Coordination
    Tang, Hainan
    Liu, Juntao
    Wang, Zhenjie
    Gao, Ziwen
    Li, You
    PROCEEDINGS OF THE 2024 3RD INTERNATIONAL SYMPOSIUM ON INTELLIGENT UNMANNED SYSTEMS AND ARTIFICIAL INTELLIGENCE, SIUSAI 2024, 2024, : 8 - 14
  • [33] Combining Policy Search with Planning in Multi-agent Cooperation
    Ma, Jie
    Cameron, Stephen
    ROBOCUP 2008: ROBOT SOCCER WORLD CUP XII, 2009, 5399 : 532 - 543
  • [34] Dynamic Exploration of Multi-agent Systems with Periodic Timed Tasks
    Arcile, Johan
    Devillers, Raymond
    Klaudel, Hanna
    FUNDAMENTA INFORMATICAE, 2020, 175 (1-4) : 59 - 95
  • [35] Dynamic Exploration of Multi-agent Systems with Periodic Timed Tasks
    Arcile, Johan
    Devillers, Raymond
    Klaudel, Hanna
    Fundamenta Informaticae, 2020, 175 (1-4): : 59 - 95
  • [36] Agent cooperation in multi-agent based network management
    Liu, B
    Li, W
    Luo, JZ
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, VOL 2, 2004, : 283 - 287
  • [37] A Multi-agent Cooperative Voltage Control Method
    Nagata, T.
    Nakachi, Y.
    Hatano, R.
    2008 IEEE POWER & ENERGY SOCIETY GENERAL MEETING, VOLS 1-11, 2008, : 3685 - +
  • [38] Multi-Agent Collaborative Target Search Based on the Multi-Agent Deep Deterministic Policy Gradient with Emotional Intrinsic Motivation
    Zhang, Xiaoping
    Zheng, Yuanpeng
    Wang, Li
    Abdulali, Arsen
    Iida, Fumiya
    APPLIED SCIENCES-BASEL, 2023, 13 (21):
  • [39] Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning
    Zhang, Yanqiang
    Feng, Dawei
    Ding, Bo
    NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 358 - 372
  • [40] Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks
    Feng, Pu
    Liang, Junkang
    Wang, Size
    Yu, Xin
    Ji, Xin
    Chen, Yiting
    Zhang, Kui
    Shi, Rongye
    Wu, Wenjun
    2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS 2024, 2024, : 642 - 649