Multi-Agent Collaborative Target Search Based on the Multi-Agent Deep Deterministic Policy Gradient with Emotional Intrinsic Motivation

被引:2
|
作者
Zhang, Xiaoping [1 ,2 ]
Zheng, Yuanpeng [1 ]
Wang, Li [1 ]
Abdulali, Arsen [2 ]
Iida, Fumiya [2 ]
机构
[1] North China Univ Technol, Sch Elect & Control Engn, Beijing 100144, Peoples R China
[2] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 21期
基金
北京市自然科学基金;
关键词
multi-agent collaboration; intrinsic motivation; MADDPG; emotion; deep reinforcement learning; TRACKING; ROBOTS;
D O I
10.3390/app132111951
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Multi-agent collaborative target search is one of the main challenges in the multi-agent field, and deep reinforcement learning (DRL) is a good way to learn such a task. However, DRL always faces the problem of sparse reward, which to some extent reduces its efficiency in task learning. Introducing intrinsic motivation has proved to be a useful way to make the sparse reward in DRL. So, based on the multi-agent deep deterministic policy gradient (MADDPG) structure, a new MADDPG algorithm with the emotional intrinsic motivation name MADDPG-E is proposed in this paper for the multi-agent collaborative target search. In MADDPG-E, a new emotional intrinsic motivation module with three emotions, joy, sadness, and fear, is designed. The three emotions are defined by corresponding psychological knowledge to the multi-agent embodied situations in an environment. An emotional steady-state variable function H is then designed to help judge the goodness of the emotions. Based on H, an emotion-based intrinsic reward function is finally proposed. With the designed emotional intrinsic motivation module, the multi-agent system always tries to make itself joy, which means it always learns to search the target. To show the effectiveness of the proposed MADDPG-E algorithm, two kinds of simulation experiments with a determined initial position and random initial position, respectively, are carried out, and comparisons are performed with MADDPG as well as MADDPG-ICM (MADDPG with an intrinsic curiosity module). The results show that with the designed emotional intrinsic motivation module, MADDPG-E has a higher learning speed and better learning stability, and the advantage is more obvious when facing complex situations.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Intrinsic Motivation for Deep Deterministic Policy Gradient in Multi-Agent Environments
    Cao, Xiaoge
    Lu, Tao
    Cai, Yinghao
    2020 CHINESE AUTOMATION CONGRESS (CAC 2020), 2020, : 1628 - 1633
  • [2] Multi-agent Collaborative Target Search Based on Curiosity Intrinsic Motivation
    Zhang, Xiaoping
    Zheng, Yuanpeng
    Wang, Li
    Iida, Fumiya
    TOWARDS AUTONOMOUS ROBOTIC SYSTEMS, TAROS 2023, 2023, 14136 : 356 - 366
  • [3] Twin Delayed Multi-Agent Deep Deterministic Policy Gradient
    Zhan, Mengying
    Chen, Jinchao
    Du, Chenglie
    Duan, Yuxin
    PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATICS AND COMPUTING (PIC), 2021, : 48 - 52
  • [4] Asynchronous Methods for Multi-agent Deep Deterministic Policy Gradient
    Jiang, Xuesong
    Li, Zhipeng
    Wei, Xiumei
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 : 711 - 721
  • [5] Multi-Agent Deep Deterministic Policy Gradient Method Based on Double Critics
    Ding S.
    Du W.
    Guo L.
    Zhang J.
    Xu X.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (10): : 2394 - 2404
  • [6] Multi-Agent Deep Deterministic Policy Gradient Algorithm Based on Classification Experience Replay
    Sun, Xiaoying
    Chen, Jinchao
    Du, Chenglie
    Zhan, Mengying
    2022 IEEE 6TH ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2022, : 988 - 992
  • [7] Deterministic Policy Gradient Based Formation Control for Multi-Agent Systems
    Hong, Zhiying
    Wang, Qingling
    2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, : 4349 - 4354
  • [8] Multi-UAV Cooperative Autonomous Navigation Based on Multi-agent Deep Deterministic Policy Gradient
    Li B.
    Yue K.-Q.
    Gan Z.-G.
    Gao P.-X.
    Yuhang Xuebao/Journal of Astronautics, 2021, 42 (06): : 757 - 765
  • [9] Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking
    Fan, Dongyu
    Shen, Haikuo
    Dong, Lijing
    ACTUATORS, 2021, 10 (10)
  • [10] A Multi-Agent Deep Deterministic Policy Gradient Method for Multi-Zone HVAC Control
    Liu, Xuebo
    Wu, Yingying
    Liu, Bo
    Wu, Hongyu
    2023 IEEE POWER & ENERGY SOCIETY GENERAL MEETING, PESGM, 2023,