Hybrid Policy Learning for Multi-Agent Pathfinding

被引:9
|
作者
Skrynnik, Alexey [1 ]
Yakovleva, Alexandra [2 ]
Davydov, Vasilii [2 ]
Yakovlev, Konstantin [1 ,2 ]
Panov, Aleksandr I. [1 ,2 ]
机构
[1] Russian Acad Sci, Fed Res Ctr Comp Sci & Control, Moscow 119333, Russia
[2] Moscow Inst Phys & Technol, Dolgoprudnyi 141700, Moscow Region, Russia
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Reinforcement learning; Planning; Task analysis; Autonomous vehicles; Navigation; Costs; Monte Carlo methods; Multiagent systems; path planning; machine learning; intelligent transportation systems; reinforcement learning; Monte-Carlo Tree Search; GO; NETWORKS;
D O I
10.1109/ACCESS.2021.3111321
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this work we study the behavior of groups of autonomous vehicles, which are the part of the Internet of Vehicles systems. One of the challenging modes of operation of such systems is the case when the observability of each vehicle is limited and the global/local communication is unstable, e.g. in the crowded parking lots. In such scenarios the vehicles have to rely on the local observations and exhibit cooperative behavior to ensure safe and efficient trips. This type of problems can be abstracted to the so-called multi-agent pathfinding when a group of agents, confined to a graph, have to find collision-free paths to their goals (ideally, minimizing an objective function e.g. travel time). Widely used algorithms for solving this problem rely on the assumption that a central controller exists for which the full state of the environment (i.e. the agents current positions, their targets, configuration of the static obstacles etc.) is known and they cannot be straightforwardly be adapted to the partially-observable setups. To this end, we suggest a novel approach which is based on the decomposition of the problem into the two sub-tasks: reaching the goal and avoiding the collisions. To accomplish each of this task we utilize reinforcement learning methods such as Deep Monte Carlo Tree Search, Q-mixing networks, and policy gradients methods to design the policies that map the agents' observations to actions. Next, we introduce the policy-mixing mechanism to end up with a single hybrid policy that allows each agent to exhibit both types of behavior - the individual one (reaching the goal) and the cooperative one (avoiding the collisions with other agents). We conduct an extensive empirical evaluation that shows that the suggested hybrid-policy outperforms standalone stat-of-the-art reinforcement learning methods for this kind of problems by a notable margin.
引用
收藏
页码:126034 / 126047
页数:14
相关论文
共 50 条
  • [1] Learning to Schedule in Multi-Agent Pathfinding
    Ahn, Kyuree
    Park, Heemang
    Park, Jinkyoo
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7326 - 7332
  • [2] Online Multi-Agent Pathfinding
    Svancara, Jiri
    Vlk, Marek
    Stern, Roni
    Atzmon, Dor
    Bartak, Roman
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 7732 - 7739
  • [3] Multi-agent Pathfinding with Communication Reinforcement Learning and Deadlock Detection
    Ye, Zhaohui
    Li, Yanjie
    Guo, Ronghao
    Gao, Jianqi
    Fu, Wen
    INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT I, 2022, 13455 : 493 - 504
  • [4] PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning
    Sartoretti, Guillaume
    Kerr, Justin
    Shi, YunFei
    Wagner, Glenn
    Kumar, T. K. Satish
    Koenig, Sven
    Choset, Howie
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2019, 4 (03): : 2378 - 2385
  • [5] Multi-Agent Pathfinding as a Combinatorial Auction
    Amir, Ofra
    Sharon, Guni
    Stern, Roni
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2003 - 2009
  • [6] Multi-agent pathfinding with continuous time
    Andreychuk, Anton
    Yakovlev, Konstantin
    Surynek, Pavel
    Atzmon, Dor
    Stern, Roni
    ARTIFICIAL INTELLIGENCE, 2022, 305
  • [7] Multi-Agent Pathfinding with Continuous Time
    Andreychuk, Anton
    Yakovlev, Konstantin
    Atzmon, Dor
    Stern, Roni
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 39 - 45
  • [8] Perception field based imitation learning for unlabeled multi-agent pathfinding
    Wenjie CHU
    Ailun YU
    Wei ZHANG
    Haiyan ZHAO
    Zhi JIN
    Science China(Information Sciences), 2024, 67 (05) : 115 - 135
  • [9] QSOD: Hybrid Policy Gradient for Deep Multi-agent Reinforcement Learning
    Rehman, Hafiz Muhammad Raza Ur
    On, Byung-Won
    Ningombam, Devarani Devi
    Yi, Sungwon
    Choi, Gyu Sang
    IEEE ACCESS, 2021, 9 : 129728 - 129741
  • [10] Multi-Agent Learning with Policy Prediction
    Zhang, Chongjie
    Lesser, Victor
    PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 927 - 934