Hybrid Policy Learning for Multi-Agent Pathfinding

被引：9

作者：

Skrynnik, Alexey ^{[1
]}

Yakovleva, Alexandra ^{[2
]}

Davydov, Vasilii ^{[2
]}

Yakovlev, Konstantin ^{[1
,2
]}

Panov, Aleksandr I. ^{[1
,2
]}

机构：

[1] Russian Acad Sci, Fed Res Ctr Comp Sci & Control, Moscow 119333, Russia

[2] Moscow Inst Phys & Technol, Dolgoprudnyi 141700, Moscow Region, Russia

来源：

IEEE ACCESS | 2021年 / 9卷

关键词：

Reinforcement learning; Planning; Task analysis; Autonomous vehicles; Navigation; Costs; Monte Carlo methods; Multiagent systems; path planning; machine learning; intelligent transportation systems; reinforcement learning; Monte-Carlo Tree Search; GO; NETWORKS;

D O I：

10.1109/ACCESS.2021.3111321

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work we study the behavior of groups of autonomous vehicles, which are the part of the Internet of Vehicles systems. One of the challenging modes of operation of such systems is the case when the observability of each vehicle is limited and the global/local communication is unstable, e.g. in the crowded parking lots. In such scenarios the vehicles have to rely on the local observations and exhibit cooperative behavior to ensure safe and efficient trips. This type of problems can be abstracted to the so-called multi-agent pathfinding when a group of agents, confined to a graph, have to find collision-free paths to their goals (ideally, minimizing an objective function e.g. travel time). Widely used algorithms for solving this problem rely on the assumption that a central controller exists for which the full state of the environment (i.e. the agents current positions, their targets, configuration of the static obstacles etc.) is known and they cannot be straightforwardly be adapted to the partially-observable setups. To this end, we suggest a novel approach which is based on the decomposition of the problem into the two sub-tasks: reaching the goal and avoiding the collisions. To accomplish each of this task we utilize reinforcement learning methods such as Deep Monte Carlo Tree Search, Q-mixing networks, and policy gradients methods to design the policies that map the agents' observations to actions. Next, we introduce the policy-mixing mechanism to end up with a single hybrid policy that allows each agent to exhibit both types of behavior - the individual one (reaching the goal) and the cooperative one (avoiding the collisions with other agents). We conduct an extensive empirical evaluation that shows that the suggested hybrid-policy outperforms standalone stat-of-the-art reinforcement learning methods for this kind of problems by a notable margin.

引用

页码：126034 / 126047

页数：14

共 50 条

[11] When to Switch: Planning and Learning for Partially Observable Multi-Agent Pathfinding
Skrynnik, Alexey
Andreychuk, Anton
Yakovlev, Konstantin
Panov, Aleksandr I.
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, : 1 - 14
[12] Perception field based imitation learning for unlabeled multi-agent pathfinding
Chu, Wenjie
Yu, Ailun
Zhang, Wei
Zhao, Haiyan
Jin, Zhi
SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (05)
[13] Safe multi-agent pathfinding with time uncertainty
Shahar, Tomer
Shekhar, Shashank
Atzmon, Dor
Saffidine, Abdallah
Juba, Brendan
Stern, Roni
1600, AI Access Foundation (70): : 923 - 954
[14] Safe Multi-Agent Pathfinding with Time Uncertainty
Shahar, Tomer
Shekhar, Shashank
Atzmon, Dor
Saffidine, Abdallah
Juba, Brendan
Stern, Roni
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2021, 70 : 923 - 954
[15] Multi-Agent Pathfinding with Hierarchical Evolutionary Hueristic A
Yiu, Ying Fung
Mahapatra, Rabi
2020 IEEE THIRD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2020), 2020, : 9 - 16
[16] TEAM POLICY LEARNING FOR MULTI-AGENT REINFORCEMENT LEARNING
Cassano, Lucas
Alghunaim, Sulaiman A.
Sayed, Ali H.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3062 - 3066
[17] Learn to Follow: Decentralized Lifelong Multi-Agent Pathfinding via Planning and Learning
Skrynnik, Alexey
Andreychuk, Anton
Nesterova, Maria
Yakovlev, Konstantin
Panov, Aleksandr
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 17541 - 17549
[18] Reinforcement learning for multi-agent patrol policy
Lab. of Complex Systems and Intelligence Sciences, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Proc. IEEE Int. Conf. Cognitive Informatics, ICCI, (530-535):
[19] On the Scalable Multi-Objective Multi-Agent Pathfinding Problem
Weise, Jens
Mai, Sebastian
Zille, Heiner
Mostaghim, Sanaz
2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
[20] A HYBRID APPROACH FOR MULTI-AGENT LEARNING SYSTEMS
Kuo, Jong Yih
Huang, Fu Chu
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2011, 17 (03): : 385 - 399

← 1 2 3 4 5 →