Monte-Carlo Tree Search for Constrained POMDPs

被引:0
|
作者
Lee, Jongmin [1 ]
Kim, Geon-Hyeong [1 ]
Poupart, Pascal [2 ,3 ]
Kim, Kee-Eung [1 ,4 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea
[2] Univ Waterloo, Waterloo AI Inst, Waterloo, ON, Canada
[3] Vector Inst, Toronto, ON, Canada
[4] PROWLER Io, Cambridge, England
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷
关键词
MARKOV DECISION-PROCESSES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monte-Carlo Tree Search (MCTS) has been successfully applied to very large POMDPs, a standard model for stochastic sequential decision-making problems. However, many real-world problems inherently have multiple goals, where multi-objective formulations are more natural. The constrained POMDP (CPOMDP) is such a model that maximizes the reward while constraining the cost, extending the standard POMDP model. To date, solution methods for CPOMDPs assume an explicit model of the environment, and thus are hardly applicable to large-scale real-world problems. In this paper, we present CC-POMCP (Cost-Constrained POMCP), an online MCTS algorithm for large CPOMDPs that leverages the optimization of LP-induced parameters and only requires a black-box simulator of the environment. In the experiments, we demonstrate that CC-POMCP converges to the optimal stochastic action selection in CPOMDP and pushes the state-of-the-art by being able to scale to very large problems.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Can Monte-Carlo Tree Search learn to sacrifice?
    Nathan Companez
    Aldeida Aleti
    Journal of Heuristics, 2016, 22 : 783 - 813
  • [32] Monte-Carlo Tree Search for the Maximum Satisfiability Problem
    Goffinet, Jack
    Ramanujan, Raghuram
    PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, CP 2016, 2016, 9892 : 251 - 267
  • [33] Parallel Monte-Carlo Tree Search for HPC Systems
    Graf, Tobias
    Lorenz, Ulf
    Platzner, Marco
    Schaefers, Lars
    EURO-PAR 2011 PARALLEL PROCESSING, PT 2, 2011, 6853 : 365 - 376
  • [34] Can Monte-Carlo Tree Search learn to sacrifice?
    Companez, Nathan
    Aleti, Aldeida
    JOURNAL OF HEURISTICS, 2016, 22 (06) : 783 - 813
  • [35] Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search
    Lim, Nengli
    Li, Yueqin
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 209 - 221
  • [36] Monte-Carlo Tree Search for the Game of Scotland Yard
    Nijssen, J. A. M.
    Winands, Mark H. M.
    2011 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND GAMES (CIG), 2011, : 158 - 165
  • [37] Monte-Carlo Tree Search by Best Arm Identification
    Kaufmann, Emilie
    Koolen, Wouter M.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [38] Monte-Carlo Tree Search for Scalable Coalition Formation
    Wu, Feng
    Ramchurn, Sarvapali D.
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 407 - 413
  • [39] EXPERIMENTS WITH MONTE-CARLO TREE SEARCH IN THE GAME OF HAVANNAH
    Lorentz, Richard J.
    ICGA JOURNAL, 2011, 34 (03) : 140 - 149
  • [40] Monte-Carlo tree search as regularized policy optimization
    Grill, Jean-Bastien
    Altche, Florent
    Tang, Yunhao
    Hubert, Thomas
    Valko, Michal
    Antonoglou, Ioannis
    Munos, Remi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119