Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response

被引:20
|
作者
Lee, Hyun-Rok [1 ]
Lee, Taesik [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Dept Ind & Syst Engn, 291 Daehak Ro, Daejeon, South Korea
基金
新加坡国家研究基金会;
关键词
OR in disaster relief; Artificial intelligence; Multi-agent reinforcement learning; Imitation learning; Selective patient admission; EMERGENCY-DEPARTMENT; PATIENT PRIORITIZATION; RESOURCE UTILIZATION; SCARCE RESOURCES; IMPATIENT JOBS; CASUALTY; TRIAGE; ALLOCATION; DECISION; DEMAND;
D O I
10.1016/j.ejor.2020.09.018
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Disaster response operations typically involve multiple decision-makers, and each decision-maker needs to make its decisions given only incomplete information on the current situation. To account for these characteristics - decision making by multiple decision-makers with partial observations to achieve a shared objective -, we formulate the decision problem as a decentralized-partially observable Markov decision process (dec-POMDP) model. To tackle a well-known difficulty of optimally solving a dec-POMDP model, multi-agent reinforcement learning (MARL) has been used as a solution technique. However, typical MARL algorithms are not always effective to solve dec-POMDP models. Motivated by evidence in single-agent RL cases, we propose a MARL algorithm augmented by pretraining. Specifically, we use behavioral cloning (BC) as a means to pretrain a neural network. We verify the effectiveness of the proposed method by solving a dec-POMDP model for a decentralized selective patient admission problem. Experimental results of three disaster scenarios show that the proposed method is a viable solution approach to solving dec-POMDP problems and that augmenting MARL with BC for its pretraining seems to offer advantages over plain MARL in terms of solution quality and computation time. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页码:296 / 308
页数:13
相关论文
共 50 条
  • [1] A reinforcement learning scheme for a partially-observable multi-agent game
    Ishii, S
    Fujita, H
    Mitsutake, M
    Yamazaki, T
    Matsuda, J
    Matsuno, Y
    [J]. MACHINE LEARNING, 2005, 59 (1-2) : 31 - 54
  • [2] A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game
    Shin Ishii
    Hajime Fujita
    Masaoki Mitsutake
    Tatsuya Yamazaki
    Jun Matsuda
    Yoichiro Matsuno
    [J]. Machine Learning, 2005, 59 : 31 - 54
  • [3] Reinforcement learning for cooperative actions in a partially observable multi-agent system
    Taniguchi, Yuki
    Mori, Takeshi
    Ishii, Shin
    [J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2007, PT 1, PROCEEDINGS, 2007, 4668 : 229 - +
  • [4] Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning
    Mao, Weichao
    Zhang, Kaiqing
    Miehling, Erik
    Basar, Tamer
    [J]. 2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 6124 - 6131
  • [5] Partially Observable Multi-Agent Deep Reinforcement Learning for Cognitive Resource Management
    Yang, Ning
    Zhang, Haijun
    Berry, Randall
    [J]. 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
  • [6] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
    Osada, H
    Fujita, S
    [J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2004, : 17 - 23
  • [7] Multi-Agent Reinforcement Learning
    Stankovic, Milos
    [J]. 2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43
  • [8] Periodic Communication for Distributed Multi-agent Reinforcement Learning under Partially Observable Environment
    Kim, Seonghyun
    Lee, Donghun
    Jang, Ingook
    Kim, Hyunseok
    Son, Youngsung
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 940 - 942
  • [9] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
    Osada, H
    Fujita, S
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1004 - 1011
  • [10] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation
    Wang, Huimu
    Qiu, Tenghai
    Liu, Zhen
    Pu, Zhiqiang
    Yi, Jianqiang
    Yuan, Wanmai
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,