Monte-Carlo Tree Search for Constrained POMDPs

被引：0

作者：

Lee, Jongmin ^{[1
]}

Kim, Geon-Hyeong ^{[1
]}

Poupart, Pascal ^{[2
,3
]}

Kim, Kee-Eung ^{[1
,4
]}

机构：

[1] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea

[2] Univ Waterloo, Waterloo AI Inst, Waterloo, ON, Canada

[3] Vector Inst, Toronto, ON, Canada

[4] PROWLER Io, Cambridge, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷

关键词：

MARKOV DECISION-PROCESSES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems. However, many real-world problems inherently have multiple goals, where multi-objective formulations are more natural. The constrained POMDP (CPOMDP) is such a model that maximizes the reward while constraining the cost, extending the standard POMDP model. To date, solution methods for CPOMDPs assume an explicit model of the environment, and thus are hardly applicable to large-scale real-world problems. In this paper, we present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment. In the experiments, we demonstrate that CC-POMCP converges to the optimal stochastic action selection in CPOMDP and pushes the state-of-the-art by being able to scale to very large problems.

引用

页数：10

共 50 条

[31] Can Monte-Carlo Tree Search learn to sacrifice?
Nathan Companez
Aldeida Aleti
Journal of Heuristics, 2016, 22 : 783 - 813
[32] Monte-Carlo Tree Search for the Maximum Satisfiability Problem
Goffinet, Jack
Ramanujan, Raghuram
PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, CP 2016, 2016, 9892 : 251 - 267
[33] Parallel Monte-Carlo Tree Search for HPC Systems
Graf, Tobias
Lorenz, Ulf
Platzner, Marco
Schaefers, Lars
EURO-PAR 2011 PARALLEL PROCESSING, PT 2, 2011, 6853 : 365 - 376
[34] Can Monte-Carlo Tree Search learn to sacrifice?
Companez, Nathan
Aleti, Aldeida
JOURNAL OF HEURISTICS, 2016, 22 (06) : 783 - 813
[35] Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search
Lim, Nengli
Li, Yueqin
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 209 - 221
[36] Monte-Carlo Tree Search for the Game of Scotland Yard
Nijssen, J. A. M.
Winands, Mark H. M.
2011 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND GAMES (CIG), 2011, : 158 - 165
[37] Monte-Carlo Tree Search by Best Arm Identification
Kaufmann, Emilie
Koolen, Wouter M.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[38] Monte-Carlo Tree Search for Scalable Coalition Formation
Wu, Feng
Ramchurn, Sarvapali D.
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 407 - 413
[39] EXPERIMENTS WITH MONTE-CARLO TREE SEARCH IN THE GAME OF HAVANNAH
Lorentz, Richard J.
ICGA JOURNAL, 2011, 34 (03) : 140 - 149
[40] Monte-Carlo tree search as regularized policy optimization
Grill, Jean-Bastien
Altche, Florent
Tang, Yunhao
Hubert, Thomas
Valko, Michal
Antonoglou, Ioannis
Munos, Remi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119

← 1 2 3 4 5 →