Monte-Carlo Tree Search for Constrained POMDPs

被引：0

作者：

Lee, Jongmin ^{[1
]}

Kim, Geon-Hyeong ^{[1
]}

Poupart, Pascal ^{[2
,3
]}

Kim, Kee-Eung ^{[1
,4
]}

机构：

[1] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea

[2] Univ Waterloo, Waterloo AI Inst, Waterloo, ON, Canada

[3] Vector Inst, Toronto, ON, Canada

[4] PROWLER Io, Cambridge, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷

关键词：

MARKOV DECISION-PROCESSES;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems. However, many real-world problems inherently have multiple goals, where multi-objective formulations are more natural. The constrained POMDP (CPOMDP) is such a model that maximizes the reward while constraining the cost, extending the standard POMDP model. To date, solution methods for CPOMDPs assume an explicit model of the environment, and thus are hardly applicable to large-scale real-world problems. In this paper, we present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment. In the experiments, we demonstrate that CC-POMCP converges to the optimal stochastic action selection in CPOMDP and pushes the state-of-the-art by being able to scale to very large problems.

引用

页数：10

共 50 条

[41] Monte-Carlo tree search for Bayesian reinforcement learning
Ngo Anh Vien
Ertel, Wolfgang
Viet-Hung Dang
Chung, TaeChoong
APPLIED INTELLIGENCE, 2013, 39 (02) : 345 - 353
[42] Monte-Carlo tree search for Bayesian reinforcement learning
Ngo Anh Vien
Wolfgang Ertel
Viet-Hung Dang
TaeChoong Chung
Applied Intelligence, 2013, 39 : 345 - 353
[43] Using evaluation functions in Monte-Carlo Tree Search
Lorentz, Richard
THEORETICAL COMPUTER SCIENCE, 2016, 644 : 106 - 113
[44] Backpropagation Modification in Monte-Carlo Game Tree Search
Xie, Fan
Liu, Zhiqing
2009 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL 2, PROCEEDINGS, 2009, : 125 - 128
[45] Monte-Carlo Tree Search in Dragline Operation Planning
Liu, Haoquan
Austin, Kevin
Forbes, Michael
Kearney, Michael
IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (01): : 419 - 425
[46] Single-Player Monte-Carlo Tree Search
Schadd, Maarten P. D.
Winands, Mark H. M.
van den Herik, H. Jaap
Chaslot, Guillaume M. J. -B.
Uiterwijk, Jos W. H. M.
COMPUTERS AND GAMES, 2008, 5131 : 1 - +
[47] Combining Monte-Carlo Tree Search with Proof-Number Search
Doe, Elliot
Winands, Mark H. M.
Soemers, Dennis J. N. J.
Browne, Cameron
2022 IEEE CONFERENCE ON GAMES, COG, 2022, : 206 - 212
[48] Online Planning for Interactive-POMDPs using Nested Monte Carlo Tree Search
Schwartz, Jonathon
Zhou, Ruijia
Kurniawati, Hanna
2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 8770 - 8777
[49] Thompson Sampling Based Monte-Carlo Planning in POMDPs
Bai, Aijun
Wu, Feng
Zhang, Zongzhang
Chen, Xiaoping
TWENTY-FOURTH INTERNATIONAL CONFERENCE ON AUTOMATED PLANNING AND SCHEDULING, 2014, : 29 - 37
[50] Efficiency of Static Knowledge Bias in Monte-Carlo Tree Search
Ikeda, Kokolo
Viennot, Simon
COMPUTERS AND GAMES, CG 2013, 2014, 8427 : 26 - 38

← 1 2 3 4 5 →