Generalized Mean Estimation in Monte-Carlo Tree Search

被引：0

作者：

Dam, Tuan ^{[1
]}

Klink, Pascal ^{[1
]}

D'Eramo, Carlo ^{[1
]}

Peters, Jan ^{[1
,2
]}

Pajarinen, Joni ^{[1
,3
]}

机构：

[1] Tech Univ Darmstadt, Dept Comp Sci, Darmstadt, Germany

[2] Max Planck Inst Intelligent Syst, Robot Learning Grp, Tubingen, Germany

[3] Tampere Univ, Comp Sci, Tampere, Finland

来源：

PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.

引用

页码：2397 / 2404

页数：8

共 50 条

[21] AIs for Dominion Using Monte-Carlo Tree Search
Tollisen, Robin
Jansen, Jon Vegard
Goodwin, Morten
Glimsdal, Sondre
[J]. CURRENT APPROACHES IN APPLIED ARTIFICIAL INTELLIGENCE, 2015, 9101 : 43 - 52
[22] Converging to a Player Model In Monte-Carlo Tree Search
Sarratt, Trevor
Pynadath, David V.
Jhala, Arnav
[J]. 2014 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND GAMES (CIG), 2014,
[23] Monte-Carlo tree search as regularized policy optimization
Grill, Jean-Bastien
Altche, Florent
Tang, Yunhao
Hubert, Thomas
Valko, Michal
Antonoglou, Ioannis
Munos, Remi
[J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[24] A SHOGI PROGRAM BASED ON MONTE-CARLO TREE SEARCH
Sato, Yoshikuni
Takahashi, Daisuke
Grimbergen, Reijer
[J]. ICGA JOURNAL, 2010, 33 (02) : 80 - 92
[25] Parallel Monte-Carlo Tree Search with Simulation Servers
Kato, Hideki
Takeuchi, Ikuo
[J]. INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2010), 2010, : 491 - 498
[26] CROSS-ENTROPY FOR MONTE-CARLO TREE SEARCH
Chaslot, Guillaume M. J. B.
Winands, Mark H. M.
Szita, Istvan
van den Herik, H. Jaap
[J]. ICGA JOURNAL, 2008, 31 (03) : 145 - 156
[27] Monte-Carlo Tree Search Parallelisation for Computer Go
van Niekerk, Francois
Kroon, Steve
van Rooyen, Gert-Jan
Inggs, Cornelia P.
[J]. PROCEEDINGS OF THE SOUTH AFRICAN INSTITUTE FOR COMPUTER SCIENTISTS AND INFORMATION TECHNOLOGISTS CONFERENCE, 2012, : 129 - 138
[28] Can Monte-Carlo Tree Search learn to sacrifice?
Nathan Companez
Aldeida Aleti
[J]. Journal of Heuristics, 2016, 22 : 783 - 813
[29] Monte-Carlo Tree Search by Best Arm Identification
Kaufmann, Emilie
Koolen, Wouter M.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[30] Monte-Carlo Tree Search for Scalable Coalition Formation
Wu, Feng
Ramchurn, Sarvapali D.
[J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 407 - 413

← 1 2 3 4 5 →