Generalized Mean Estimation in Monte-Carlo Tree Search

被引:0
|
作者
Dam, Tuan [1 ]
Klink, Pascal [1 ]
D'Eramo, Carlo [1 ]
Peters, Jan [1 ,2 ]
Pajarinen, Joni [1 ,3 ]
机构
[1] Tech Univ Darmstadt, Dept Comp Sci, Darmstadt, Germany
[2] Max Planck Inst Intelligent Syst, Robot Learning Grp, Tubingen, Germany
[3] Tampere Univ, Comp Sci, Tampere, Finland
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.
引用
收藏
页码:2397 / 2404
页数:8
相关论文
共 50 条
  • [31] Monte-Carlo Tree Search for Scalable Coalition Formation
    Wu, Feng
    Ramchurn, Sarvapali D.
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 407 - 413
  • [32] Monte-Carlo Tree Search by Best Arm Identification
    Kaufmann, Emilie
    Koolen, Wouter M.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [33] EXPERIMENTS WITH MONTE-CARLO TREE SEARCH IN THE GAME OF HAVANNAH
    Lorentz, Richard J.
    [J]. ICGA JOURNAL, 2011, 34 (03) : 140 - 149
  • [34] Monte-Carlo Tree Search Parallelisation for Computer Go
    van Niekerk, Francois
    Kroon, Steve
    van Rooyen, Gert-Jan
    Inggs, Cornelia P.
    [J]. PROCEEDINGS OF THE SOUTH AFRICAN INSTITUTE FOR COMPUTER SCIENTISTS AND INFORMATION TECHNOLOGISTS CONFERENCE, 2012, : 129 - 138
  • [35] CROSS-ENTROPY FOR MONTE-CARLO TREE SEARCH
    Chaslot, Guillaume M. J. B.
    Winands, Mark H. M.
    Szita, Istvan
    van den Herik, H. Jaap
    [J]. ICGA JOURNAL, 2008, 31 (03) : 145 - 156
  • [36] Can Monte-Carlo Tree Search learn to sacrifice?
    Nathan Companez
    Aldeida Aleti
    [J]. Journal of Heuristics, 2016, 22 : 783 - 813
  • [37] Backpropagation Modification in Monte-Carlo Game Tree Search
    Xie, Fan
    Liu, Zhiqing
    [J]. 2009 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL 2, PROCEEDINGS, 2009, : 125 - 128
  • [38] Monte-Carlo tree search for Bayesian reinforcement learning
    Ngo Anh Vien
    Ertel, Wolfgang
    Viet-Hung Dang
    Chung, TaeChoong
    [J]. APPLIED INTELLIGENCE, 2013, 39 (02) : 345 - 353
  • [39] Monte-Carlo tree search as regularized policy optimization
    Grill, Jean-Bastien
    Altche, Florent
    Tang, Yunhao
    Hubert, Thomas
    Valko, Michal
    Antonoglou, Ioannis
    Munos, Remi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [40] Using evaluation functions in Monte-Carlo Tree Search
    Lorentz, Richard
    [J]. THEORETICAL COMPUTER SCIENCE, 2016, 644 : 106 - 113