Generalized Mean Estimation in Monte-Carlo Tree Search

被引：0

作者：

Dam, Tuan ^{[1
]}

Klink, Pascal ^{[1
]}

D'Eramo, Carlo ^{[1
]}

Peters, Jan ^{[1
,2
]}

Pajarinen, Joni ^{[1
,3
]}

机构：

[1] Tech Univ Darmstadt, Dept Comp Sci, Darmstadt, Germany

[2] Max Planck Inst Intelligent Syst, Robot Learning Grp, Tubingen, Germany

[3] Tampere Univ, Comp Sci, Tampere, Finland

来源：

PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.

引用

页码：2397 / 2404

页数：8

共 50 条

[31] Monte-Carlo Tree Search for Scalable Coalition Formation
Wu, Feng
Ramchurn, Sarvapali D.
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 407 - 413
[32] Monte-Carlo Tree Search by Best Arm Identification
Kaufmann, Emilie
Koolen, Wouter M.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[33] EXPERIMENTS WITH MONTE-CARLO TREE SEARCH IN THE GAME OF HAVANNAH
Lorentz, Richard J.
[J]. ICGA JOURNAL, 2011, 34 (03) : 140 - 149
[34] Monte-Carlo Tree Search Parallelisation for Computer Go
van Niekerk, Francois
Kroon, Steve
van Rooyen, Gert-Jan
Inggs, Cornelia P.
[J]. PROCEEDINGS OF THE SOUTH AFRICAN INSTITUTE FOR COMPUTER SCIENTISTS AND INFORMATION TECHNOLOGISTS CONFERENCE, 2012, : 129 - 138
[35] CROSS-ENTROPY FOR MONTE-CARLO TREE SEARCH
Chaslot, Guillaume M. J. B.
Winands, Mark H. M.
Szita, Istvan
van den Herik, H. Jaap
[J]. ICGA JOURNAL, 2008, 31 (03) : 145 - 156
[36] Can Monte-Carlo Tree Search learn to sacrifice?
Nathan Companez
Aldeida Aleti
[J]. Journal of Heuristics, 2016, 22 : 783 - 813
[37] Backpropagation Modification in Monte-Carlo Game Tree Search
Xie, Fan
Liu, Zhiqing
[J]. 2009 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL 2, PROCEEDINGS, 2009, : 125 - 128
[38] Monte-Carlo tree search for Bayesian reinforcement learning
Ngo Anh Vien
Ertel, Wolfgang
Viet-Hung Dang
Chung, TaeChoong
[J]. APPLIED INTELLIGENCE, 2013, 39 (02) : 345 - 353
[39] Monte-Carlo tree search as regularized policy optimization
Grill, Jean-Bastien
Altche, Florent
Tang, Yunhao
Hubert, Thomas
Valko, Michal
Antonoglou, Ioannis
Munos, Remi
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[40] Using evaluation functions in Monte-Carlo Tree Search
Lorentz, Richard
[J]. THEORETICAL COMPUTER SCIENCE, 2016, 644 : 106 - 113

← 1 2 3 4 5 →