Generalized Mean Estimation in Monte-Carlo Tree Search

被引:0
|
作者
Dam, Tuan [1 ]
Klink, Pascal [1 ]
D'Eramo, Carlo [1 ]
Peters, Jan [1 ,2 ]
Pajarinen, Joni [1 ,3 ]
机构
[1] Tech Univ Darmstadt, Dept Comp Sci, Darmstadt, Germany
[2] Max Planck Inst Intelligent Syst, Robot Learning Grp, Tubingen, Germany
[3] Tampere Univ, Comp Sci, Tampere, Finland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.
引用
收藏
页码:2397 / 2404
页数:8
相关论文
共 50 条
  • [21] AIs for Dominion Using Monte-Carlo Tree Search
    Tollisen, Robin
    Jansen, Jon Vegard
    Goodwin, Morten
    Glimsdal, Sondre
    [J]. CURRENT APPROACHES IN APPLIED ARTIFICIAL INTELLIGENCE, 2015, 9101 : 43 - 52
  • [22] Converging to a Player Model In Monte-Carlo Tree Search
    Sarratt, Trevor
    Pynadath, David V.
    Jhala, Arnav
    [J]. 2014 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND GAMES (CIG), 2014,
  • [23] Monte-Carlo tree search as regularized policy optimization
    Grill, Jean-Bastien
    Altche, Florent
    Tang, Yunhao
    Hubert, Thomas
    Valko, Michal
    Antonoglou, Ioannis
    Munos, Remi
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [24] A SHOGI PROGRAM BASED ON MONTE-CARLO TREE SEARCH
    Sato, Yoshikuni
    Takahashi, Daisuke
    Grimbergen, Reijer
    [J]. ICGA JOURNAL, 2010, 33 (02) : 80 - 92
  • [25] Parallel Monte-Carlo Tree Search with Simulation Servers
    Kato, Hideki
    Takeuchi, Ikuo
    [J]. INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2010), 2010, : 491 - 498
  • [26] CROSS-ENTROPY FOR MONTE-CARLO TREE SEARCH
    Chaslot, Guillaume M. J. B.
    Winands, Mark H. M.
    Szita, Istvan
    van den Herik, H. Jaap
    [J]. ICGA JOURNAL, 2008, 31 (03) : 145 - 156
  • [27] Monte-Carlo Tree Search Parallelisation for Computer Go
    van Niekerk, Francois
    Kroon, Steve
    van Rooyen, Gert-Jan
    Inggs, Cornelia P.
    [J]. PROCEEDINGS OF THE SOUTH AFRICAN INSTITUTE FOR COMPUTER SCIENTISTS AND INFORMATION TECHNOLOGISTS CONFERENCE, 2012, : 129 - 138
  • [28] Can Monte-Carlo Tree Search learn to sacrifice?
    Nathan Companez
    Aldeida Aleti
    [J]. Journal of Heuristics, 2016, 22 : 783 - 813
  • [29] Monte-Carlo Tree Search by Best Arm Identification
    Kaufmann, Emilie
    Koolen, Wouter M.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [30] Monte-Carlo Tree Search for Scalable Coalition Formation
    Wu, Feng
    Ramchurn, Sarvapali D.
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 407 - 413