Multi-agent reinforcement learning as a rehearsal for decentralized planning

被引:182
|
作者
Kraemer, Landon [1 ]
Banerjee, Bikramjit [1 ]
机构
[1] Univ So Mississippi, Sch Comp, Hattiesburg, MS 39406 USA
基金
美国国家科学基金会;
关键词
Multi-agent reinforcement learning; Decentralized planning;
D O I
10.1016/j.neucom.2016.01.031
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Decentralized partially observable Markov decision processes (Dec-POMDPs) are a powerful tool for modeling multi-agent planning and decision-making under uncertainty. Prevalent Dec-POMDP solution techniques require centralized computation given full knowledge of the underlying model. Multi-agent reinforcement learning (MARL) based approaches have been recently proposed for distributed solution of Dec-POMDPs without full prior knowledge of the model, but these methods assume that conditions during learning and policy execution are identical. In some, practical scenarios this may not be the case. We propose a novel MARL approach in which agents are allowed to rehearse with information that will not be available during policy execution. The key is for the agents to learn policies that do not explicitly rely on these rehearsal features. We also establish a weak convergence result for our algorithm, RLaR, demonstrating that RLaR converges in probability when certain conditions are met. We show experimentally that incorporating rehearsal features can enhance the learning rate compared to non-rehearsal based learners, and demonstrate fast, (near) optimal performance on many existing benchmark Dec-POMDP problems. We also compare RLaR against an existing approximate Dec-POMDP solver which, like RLaR, does not assume a priori knowledge of the model. While RLaR's policy representation is not as scalable, we show that RLaR produces higher quality policies for most problems and horizons studied. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:82 / 94
页数:13
相关论文
共 50 条
  • [41] Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning
    Lu, Songtao
    Zhang, Kaiqing
    Chen, Tianyi
    Basar, Tamer
    Horesh, Lior
    [J]. THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8767 - 8775
  • [42] Trajectory planning of space manipulator based on multi-agent reinforcement learning
    Zhao, Yu
    Guan, Gongshun
    Guo, Jifeng
    Yu, Xiaoqiang
    Yan, Peng
    [J]. Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2021, 42 (01):
  • [43] Attention-Cooperated Reinforcement Learning for Multi-agent Path Planning
    Ma, Jinchao
    Lian, Defu
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS. DASFAA 2022 INTERNATIONAL WORKSHOPS, 2022, 13248 : 272 - 290
  • [44] Safe multi-agent motion planning via filtered reinforcement learning
    Vinod, Abraham P.
    Safaoui, Sleiman
    Chakrabarty, Ankush
    Quirynen, Rien
    Yoshikawa, Nobuyuki
    Di Cairano, Stefano
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 7270 - 7276
  • [45] Energy Constrained Multi-Agent Reinforcement Learning for Coverage Path Planning
    Zhao, Chenyang
    Liu, Juan
    Yoon, Suk-Un
    Li, Xinde
    Li, Heqing
    Zhang, Zhentong
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 5590 - 5597
  • [46] Multi-agent Coverage Path Planning Based on Security Reinforcement Learning
    Li, Song
    Ma, Zhuangzhuang
    Zhang, Yunlin
    Shao, Jinliang
    [J]. Binggong Xuebao/Acta Armamentarii, 2023, 44 : 101 - 113
  • [47] Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
    Omidshafiei, Shayegan
    Pazis, Jason
    Amato, Christopher
    How, Jonathan P.
    Vian, John
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [48] Hysteretic Q-Learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams
    Matignon, Laetitia
    Laurent, Guillaume J.
    Le Fort-Piat, Nadine
    [J]. 2007 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-9, 2007, : 64 - 69
  • [49] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation
    Wang, Huimu
    Qiu, Tenghai
    Liu, Zhen
    Pu, Zhiqiang
    Yi, Jianqiang
    Yuan, Wanmai
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [50] Hierarchical multi-agent reinforcement learning
    Mohammad Ghavamzadeh
    Sridhar Mahadevan
    Rajbala Makar
    [J]. Autonomous Agents and Multi-Agent Systems, 2006, 13 : 197 - 229