Generalized Mean Estimation in Monte-Carlo Tree Search

被引:0
|
作者
Dam, Tuan [1 ]
Klink, Pascal [1 ]
D'Eramo, Carlo [1 ]
Peters, Jan [1 ,2 ]
Pajarinen, Joni [1 ,3 ]
机构
[1] Tech Univ Darmstadt, Dept Comp Sci, Darmstadt, Germany
[2] Max Planck Inst Intelligent Syst, Robot Learning Grp, Tubingen, Germany
[3] Tampere Univ, Comp Sci, Tampere, Finland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.
引用
收藏
页码:2397 / 2404
页数:8
相关论文
共 50 条
  • [1] Monte-Carlo Tree Search for Logistics
    Edelkamp, Stefan
    Gath, Max
    Greulich, Christoph
    Humann, Malte
    Herzog, Otthein
    Lawo, Michael
    [J]. COMMERCIAL TRANSPORT, 2016, : 427 - 440
  • [2] Monte-Carlo Tree Search Solver
    Winands, Mark H. M.
    Bjornsson, Yngvi
    Saito, Jahn-Takeshi
    [J]. COMPUTERS AND GAMES, 2008, 5131 : 25 - +
  • [3] Parallel Monte-Carlo Tree Search
    Chaslot, Guillaume M. J. -B.
    Winands, Mark H. M.
    van den Herik, H. Jaap
    [J]. COMPUTERS AND GAMES, 2008, 5131 : 60 - +
  • [4] Monte-Carlo Tree Search with Tree Shape Control
    Marchenko, Oleksandr I.
    Marchenko, Oleksii O.
    [J]. 2017 IEEE FIRST UKRAINE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (UKRCON), 2017, : 812 - 817
  • [5] Monte-Carlo Tree Search: To MC or to DP?
    Feldman, Zohar
    Domshlak, Carmel
    [J]. 21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 321 - 326
  • [6] Monte-Carlo Tree Search for Constrained POMDPs
    Lee, Jongmin
    Kim, Geon-Hyeong
    Poupart, Pascal
    Kim, Kee-Eung
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [7] Monte-Carlo Tree Search in Settlers of Catan
    Szita, Istvan
    Chaslot, Guillaume
    Spronck, Pieter
    [J]. ADVANCES IN COMPUTER GAMES, 2010, 6048 : 21 - +
  • [8] Scalability and Parallelization of Monte-Carlo Tree Search
    Bourki, Amine
    Chaslot, Guillaume
    Coulm, Matthieu
    Danjean, Vincent
    Doghmen, Hassen
    Hoock, Jean-Baptiste
    Herault, Thomas
    Rimmel, Arpad
    Teytaud, Fabien
    Teytaud, Olivier
    Vayssiere, Paul
    Yu, Ziqin
    [J]. COMPUTERS AND GAMES, 2011, 6515 : 48 - 58
  • [9] Monte-Carlo Tree Search for Policy Optimization
    Ma, Xiaobai
    Driggs-Campbell, Katherine
    Zhang, Zongzhang
    Kochenderfer, Mykel J.
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3116 - 3122
  • [10] Improving Monte-Carlo Tree Search in Havannah
    Lorentz, Richard J.
    [J]. COMPUTERS AND GAMES, 2011, 6515 : 105 - 115