Multi-agent reinforcement learning algorithm to solve a partially-observable multi-agent problem in disaster response

被引：20

作者：

Lee, Hyun-Rok ^{[1
]}

Lee, Taesik ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Dept Ind & Syst Engn, 291 Daehak Ro, Daejeon, South Korea

来源：

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH | 2021年 / 291卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

OR in disaster relief; Artificial intelligence; Multi-agent reinforcement learning; Imitation learning; Selective patient admission; EMERGENCY-DEPARTMENT; PATIENT PRIORITIZATION; RESOURCE UTILIZATION; SCARCE RESOURCES; IMPATIENT JOBS; CASUALTY; TRIAGE; ALLOCATION; DECISION; DEMAND;

D O I：

10.1016/j.ejor.2020.09.018

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

Disaster response operations typically involve multiple decision-makers, and each decision-maker needs to make its decisions given only incomplete information on the current situation. To account for these characteristics - decision making by multiple decision-makers with partial observations to achieve a shared objective -, we formulate the decision problem as a decentralized-partially observable Markov decision process (dec-POMDP) model. To tackle a well-known difficulty of optimally solving a dec-POMDP model, multi-agent reinforcement learning (MARL) has been used as a solution technique. However, typical MARL algorithms are not always effective to solve dec-POMDP models. Motivated by evidence in single-agent RL cases, we propose a MARL algorithm augmented by pretraining. Specifically, we use behavioral cloning (BC) as a means to pretrain a neural network. We verify the effectiveness of the proposed method by solving a dec-POMDP model for a decentralized selective patient admission problem. Experimental results of three disaster scenarios show that the proposed method is a viable solution approach to solving dec-POMDP problems and that augmenting MARL with BC for its pretraining seems to offer advantages over plain MARL in terms of solution quality and computation time. (C) 2020 Elsevier B.V. All rights reserved.

引用

页码：296 / 308

页数：13

共 50 条

[1] A reinforcement learning scheme for a partially-observable multi-agent game
Ishii, S
Fujita, H
Mitsutake, M
Yamazaki, T
Matsuda, J
Matsuno, Y
[J]. MACHINE LEARNING, 2005, 59 (1-2) : 31 - 54
[2] A Reinforcement Learning Scheme for a Partially-Observable Multi-Agent Game
Shin Ishii
Hajime Fujita
Masaoki Mitsutake
Tatsuya Yamazaki
Jun Matsuda
Yoichiro Matsuno
[J]. Machine Learning, 2005, 59 : 31 - 54
[3] Reinforcement learning for cooperative actions in a partially observable multi-agent system
Taniguchi, Yuki
Mori, Takeshi
Ishii, Shin
[J]. ARTIFICIAL NEURAL NETWORKS - ICANN 2007, PT 1, PROCEEDINGS, 2007, 4668 : 229 - +
[4] Information State Embedding in Partially Observable Cooperative Multi-Agent Reinforcement Learning
Mao, Weichao
Zhang, Kaiqing
Miehling, Erik
Basar, Tamer
[J]. 2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 6124 - 6131
[5] Partially Observable Multi-Agent Deep Reinforcement Learning for Cognitive Resource Management
Yang, Ning
Zhang, Haijun
Berry, Randall
[J]. 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
[6] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
Osada, H
Fujita, S
[J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON INTELLIGENT AGENT TECHNOLOGY, PROCEEDINGS, 2004, : 17 - 23
[7] Multi-Agent Reinforcement Learning
Stankovic, Milos
[J]. 2016 13TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2016, : 43 - 43
[8] Periodic Communication for Distributed Multi-agent Reinforcement Learning under Partially Observable Environment
Kim, Seonghyun
Lee, Donghun
Jang, Ingook
Kim, Hyunseok
Son, Youngsung
[J]. 2019 10TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC): ICT CONVERGENCE LEADING THE AUTONOMOUS FUTURE, 2019, : 940 - 942
[9] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
Osada, H
Fujita, S
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1004 - 1011
[10] Multi-Agent Cognition Difference Reinforcement Learning for Multi-Agent Cooperation
Wang, Huimu
Qiu, Tenghai
Liu, Zhen
Pu, Zhiqiang
Yi, Jianqiang
Yuan, Wanmai
[J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,

← 1 2 3 4 5 →