Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response

被引:20
|
作者
Lee, Hyun-Rok [1 ]
Lee, Taesik [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Ind & Syst Engn, 291 Daehak Ro, Daejeon, South Korea
基金
新加坡国家研究基金会;
关键词
OR in disaster relief; Artificial intelligence; Multi-agent reinforcement learning; Imitation learning; Selective patient admission; EMERGENCY-DEPARTMENT; PATIENT PRIORITIZATION; RESOURCE UTILIZATION; SCARCE RESOURCES; IMPATIENT JOBS; CASUALTY; TRIAGE; ALLOCATION; DECISION; DEMAND;
D O I
10.1016/j.ejor.2020.09.018
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Disaster response operations typically involve multiple decision-makers, and each decision-maker needs to make its decisions given only incomplete information on the current situation. To account for these characteristics - decision making by multiple decision-makers with partial observations to achieve a shared objective -, we formulate the decision problem as a decentralized-partially observable Markov decision process (dec-POMDP) model. To tackle a well-known difficulty of optimally solving a dec-POMDP model, multi-agent reinforcement learning (MARL) has been used as a solution technique. However, typical MARL algorithms are not always effective to solve dec-POMDP models. Motivated by evidence in single-agent RL cases, we propose a MARL algorithm augmented by pretraining. Specifically, we use behavioral cloning (BC) as a means to pretrain a neural network. We verify the effectiveness of the proposed method by solving a dec-POMDP model for a decentralized selective patient admission problem. Experimental results of three disaster scenarios show that the proposed method is a viable solution approach to solving dec-POMDP problems and that augmenting MARL with BC for its pretraining seems to offer advantages over plain MARL in terms of solution quality and computation time. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:296 / 308
页数:13
相关论文
共 50 条
  • [41] A projective simulation scheme for partially observable multi-agent systems
    Rasoul Kheiri
    [J]. Quantum Machine Intelligence, 2021, 3
  • [42] Signal learning with messages by reinforcement learning in multi-agent pursuit problem
    Noro, Kozue
    Tenmoto, Hiroshi
    Kamiya, Akimoto
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 18TH ANNUAL CONFERENCE, KES-2014, 2014, 35 : 233 - 240
  • [43] A novel multi-agent Q-learning algorithm in cooperative multi-agent system
    Ou, HT
    Zhang, WD
    Zhang, WY
    Xu, XM
    [J]. PROCEEDINGS OF THE 3RD WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-5, 2000, : 272 - 276
  • [44] Parallel and distributed multi-agent reinforcement learning
    Kaya, M
    Arslan, A
    [J]. PROCEEDINGS OF THE EIGHTH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS, 2001, : 437 - 441
  • [45] Coding for Distributed Multi-Agent Reinforcement Learning
    Wang, Baoqian
    Xie, Junfei
    Atanasov, Nikolay
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 10625 - 10631
  • [46] Multi-agent Reinforcement Learning for Service Composition
    Lei, Yu
    Yu, Philip S.
    [J]. PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2016), 2016, : 790 - 793
  • [47] Reinforcement learning of multi-agent communicative acts
    Hoet S.
    Sabouret N.
    [J]. Revue d'Intelligence Artificielle, 2010, 24 (02) : 159 - 188
  • [48] Multi-agent Reinforcement Learning in Network Management
    Bagnasco, Ricardo
    Serrat, Joan
    [J]. SCALABILITY OF NETWORKS AND SERVICES, PROCEEDINGS, 2009, 5637 : 199 - 202
  • [49] Multi-agent reinforcement learning with adaptive mimetism
    Yamaguchi, T
    Miura, M
    Yachida, M
    [J]. ETFA '96 - 1996 IEEE CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION, PROCEEDINGS, VOLS 1 AND 2, 1996, : 288 - 294
  • [50] Multi-agent reinforcement learning for character control
    Li, Cheng
    Fussell, Levi
    Komura, Taku
    [J]. VISUAL COMPUTER, 2021, 37 (12): : 3115 - 3123