Multi-agent cooperation policy gradient method based on enhanced exploration for cooperative tasks

被引:2
|
作者
Zhao, Li-yang [1 ]
Chang, Tian-qing [1 ]
Zhang, Lei [1 ]
Zhang, Xin-lu [2 ]
Wang, Jiang-feng [1 ]
机构
[1] Army Acad Armored Forces, Dept Weaponry & Control, Beijing 100072, Peoples R China
[2] China Huayin Ordnance Test Ctr, Huayin 714200, Peoples R China
关键词
Multi-agent deep reinforcement learning; Exploration; Intrinsic reward; Self-imitation learning; Value function input; NEURAL-NETWORKS; GAME; GO;
D O I
10.1007/s13042-023-01976-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-agent cooperation and coordination are often essential for task fulfillment. Multi-agent deep reinforcement learning (MADRL) can effectively learn the solutions to problems, but its application is still primarily restricted by the exploration-exploitation trade-off. Therefore, the focus of MADRL research is placed on how to effectively explore the environment and collect good experience with rich information to strengthen cooperative behaviors and optimize policy learning. To address this problem, we propose a novel multi-agent cooperation policy gradient method called multi-agent proximal policy optimization based on self-imitation learning and random network distillation (MAPPOSR). MAPPOSR consists of two policy gradient-based additional components, namely (1) random network distillation (RND) exploration bonus component that produces intrinsic rewards and encourages agents to access new states and actions, thereby helping them explore better trajectories and avoiding the algorithm prematurely converging or getting stuck in local optima; and (2) self-imitation learning (SIL) policy update component that stores and reuses high-return trajectory samples generated by agents themselves, thereby strengthening their cooperation and boosting learning efficiency. The experimental results show that in addition to effectively solving the hard-exploration problem, the proposed method significantly outperforms other SOTA MADRL algorithms in learning efficiency as well as in escaping local optima. Moreover, the effect of different function inputs on algorithm performance is investigated in the centralized training and decentralized execution (CTDE) framework, based on which a joint-observation coding method based on individual is developed. By encouraging the agent to focus more on the local observation information of other agents related to it and abandon global state information provided by the environment, the developed coding method can remove the effects of excessive value function input dimensions and redundant feature information on algorithm performance.
引用
收藏
页码:1431 / 1452
页数:22
相关论文
共 50 条
  • [21] Acquisition of Shared Symbols in Multi-Agent Cooperative Tasks
    Kayal, Siavash
    Aminaiee, Abdol Hossein
    Lucas, Caro
    IEEE INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN ROBOTICS AND AUTOMATION, 2009, : 441 - +
  • [22] Exploration via Joint Policy Diversity for Sparse-Reward Multi-Agent Tasks
    Xu, Pei
    Zhang, Junge
    Huang, Kaiqi
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 326 - 334
  • [23] Online Multi-Agent Based Cooperative Exploration and Coverage in Complex Environment
    Koval, Anton
    Mansouri, Sina Sharif
    Nikolakopoulos, George
    2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), 2019, : 3964 - 3969
  • [24] MAPPG: Multi-agent Phasic Policy Gradient
    Zhang, Qi
    Zhang, Xuetao
    Liu, Yisha
    Zhang, Xuebo
    Zhuang, Yan
    2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, 2023, : 2366 - 2371
  • [25] Learning Efficient Multi-agent Cooperative Visual Exploration
    Yu, Chao
    Yang, Xinyi
    Gao, Jiaxuan
    Yang, Huazhong
    Wang, Yu
    Wu, Yi
    COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 497 - 515
  • [26] Cooperative Exploration for Multi-Agent Deep Reinforcement Learning
    Liu, Iou-Jen
    Jain, Unnat
    Yeh, Raymond A.
    Schwing, Alexander G.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [27] Deterministic Policy Gradient Based Formation Control for Multi-Agent Systems
    Hong, Zhiying
    Wang, Qingling
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4349 - 4354
  • [28] Learning Communication with Limited Range in Multi-agent Cooperative Tasks
    Ning, Chengyu
    Lu, Guoming
    ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2023, 2024, 1998 : 433 - 442
  • [29] An Efficient Centralized Multi-Agent Reinforcement Learner for Cooperative Tasks
    Liao, Dengyu
    Zhang, Zhen
    Song, Tingting
    Liu, Mingyang
    IEEE ACCESS, 2023, 11 : 139284 - 139294
  • [30] Knowledge Reuse of Multi-Agent Reinforcement Learning in Cooperative Tasks
    Shi, Daming
    Tong, Junbo
    Liu, Yi
    Fan, Wenhui
    ENTROPY, 2022, 24 (04)