Independent Deep Deterministic Policy Gradient Reinforcement Learning in Cooperative Multiagent Pursuit Games

被引：1

作者：

Zhou, Shiyang ^{[1
,2
]}

Ren, Weiya ^{[1
,2
]}

Ren, Xiaoguang ^{[1
,2
]}

Wang, Yanzhen ^{[1
,2
]}

Yi, Xiaodong ^{[1
,2
]}

机构：

[1] Def Innovat Inst, Artificial Intelligence Res Ctr, Beijing 100072, Peoples R China

[2] Tianjin Artificial Intelligence Innovat Ctr, Tianjin 300457, Peoples R China

来源：

ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT IV | 2021年 / 12894卷

关键词：

Reinforcement learning; Actor-critic; Potential field; Planning and learning; Predator-prey;

D O I：

10.1007/978-3-030-86380-7_51

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we study a fully decentralized multi-agent pursuit problem in a non-communication environment. Fully decentralized (decentralized training and decentralized execution) has stronger robustness and scalability compared with centralized training and decentralized execution (CTDE), which is the current popular multi-agent reinforcement learning method. Both centralized training and communication mechanism require a large amount of information exchange between agents, which are strong assumptions that are difficult to meet in reality. However, traditional fully decentralized multi-agent reinforcement learning methods (e.g., IQL) are difficult to converge stably due to the dynamic changes of other agents' strategies. Therefore, we extend actor-critic to actor-critic-N framework, and propose Potential-Field-Guided Deep Deterministic Policy Gradient (PGDDPG) method on this basis. The agent uses the unified artificial potential field to guide the agent's strategy updating, which reduces the uncertainty of multi-agent's decision making in the complex and dynamic changing environment. Thus, PGDDPG which we proposed can converge fast and stably. Finally, through the pursuit experiments in MPE and CARLA, we prove that our method achieves higher success rate and more stable performance than DDPG and MADDPG.

引用

页码：625 / 637

页数：13

共 50 条

[1] Semicentralized Deep Deterministic Policy Gradient in Cooperative StarCraft Games
Xie, Dong
Zhong, Xiangnan
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (04) : 1584 - 1593
[2] A Deep Reinforcement Learning Method based on Deterministic Policy Gradient for Multi-Agent Cooperative Competition
Zuo, Xuan
Xue, Hui-Feng
Wang, Xiao-Yin
Du, Wan-Ru
Tian, Tao
Gao, Shan
Zhang, Pu
[J]. CONTROL ENGINEERING AND APPLIED INFORMATICS, 2021, 23 (03): : 88 - 98
[3] Deep Ensemble Reinforcement Learning with Multiple Deep Deterministic Policy Gradient Algorithm
Wu, Junta
Li, Huiyun
[J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
[4] Peer Incentive Reinforcement Learning for Cooperative Multiagent Games
Zhang, Tianle
Liu, Zhen
Pu, Zhiqiang
Yi, Jianqiang
[J]. IEEE TRANSACTIONS ON GAMES, 2023, 15 (04) : 623 - 636
[5] Multiagent Cooperative Learning Strategies for Pursuit-Evasion Games
Kuo, Jong Yih
Yu, Hsiang-Fu
Liu, Kevin Fong-Rey
Lee, Fang-Wen
[J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
[6] Generative Adversarial Inverse Reinforcement Learning With Deep Deterministic Policy Gradient
Zhan, Ming
Fan, Jingjing
Guo, Jianying
[J]. IEEE ACCESS, 2023, 11 : 87732 - 87746
[7] Cooperative Multiagent Deep Deterministic Policy Gradient (CoMADDPG) for Intelligent Connected Transportation with Unsignalized Intersection
Wu, Tianhao
Jiang, Mingzhi
Zhang, Lin
[J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
[8] A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
Kim, Dong-Ki
Liu, Miao
Riemer, Matthew
Sun, Chuangchuang
Abdulhai, Marwa
Habibi, Golnaz
Lopez-Cot, Sebastian
Tesauro, Gerald
How, Jonathan P.
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[9] Strategy Generation Based on Reinforcement Learning with Deep Deterministic Policy Gradient for UCAV
Ma, Yunhong
Bai, Shuyao
Zhao, Yifei
Song, Chao
Yang, Jie
[J]. 16TH IEEE INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2020), 2020, : 789 - 794
[10] Reinforcement Learning for Mobile Robot Obstacle Avoidance with Deep Deterministic Policy Gradient
Chen, Miao
Li, Wenna
Fei, Shihan
Wei, Yufei
Tu, Mingyang
Li, Jiangbo
[J]. INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT III, 2022, 13457 : 197 - 204

← 1 2 3 4 5 →