Generalized Mean Estimation in Monte-Carlo Tree Search

被引：0

作者：

Dam, Tuan ^{[1
]}

Klink, Pascal ^{[1
]}

D'Eramo, Carlo ^{[1
]}

Peters, Jan ^{[1
,2
]}

Pajarinen, Joni ^{[1
,3
]}

机构：

[1] Tech Univ Darmstadt, Dept Comp Sci, Darmstadt, Germany

[2] Max Planck Inst Intelligent Syst, Robot Learning Grp, Tubingen, Germany

[3] Tampere Univ, Comp Sci, Tampere, Finland

来源：

PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.

引用

页码：2397 / 2404

页数：8

共 50 条

[11] Improving Monte-Carlo Tree Search in Havannah
Lorentz, Richard J.
[J]. COMPUTERS AND GAMES, 2011, 6515 : 105 - 115
[12] Score Bounded Monte-Carlo Tree Search
Cazenave, Tristan
Saffidine, Abdallah
[J]. COMPUTERS AND GAMES, 2011, 6515 : 93 - 104
[13] Scalability and Parallelization of Monte-Carlo Tree Search
Bourki, Amine
Chaslot, Guillaume
Coulm, Matthieu
Danjean, Vincent
Doghmen, Hassen
Hoock, Jean-Baptiste
Herault, Thomas
Rimmel, Arpad
Teytaud, Fabien
Teytaud, Olivier
Vayssiere, Paul
Yu, Ziqin
[J]. COMPUTERS AND GAMES, 2011, 6515 : 48 - 58
[14] Monte-Carlo Tree Search for Policy Optimization
Ma, Xiaobai
Driggs-Campbell, Katherine
Zhang, Zongzhang
Kochenderfer, Mykel J.
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3116 - 3122
[15] Convex Regularization in Monte-Carlo Tree Search
Dam, Tuan
D'Eramo, Carlo
Peters, Jan
Pajarinen, Joni
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[16] Split Moves for Monte-Carlo Tree Search
Kowalski, Jakub
Mika, Maksymilian
Pawlik, Wojciech
Sutowicz, Jakub
Szykula, Marek
Winands, Mark H. M.
[J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10247 - 10255
[17] The Multiple Uses of Monte-Carlo Tree Search
Senington, Richard
[J]. SPS 2022, 2022, 21 : 713 - 724
[18] Monte-Carlo tree search and rapid action value estimation in computer Go
Gelly, Sylvain
Silver, David
[J]. ARTIFICIAL INTELLIGENCE, 2011, 175 (11) : 1856 - 1875
[19] Multiple Tree for Partially Observable Monte-Carlo Tree Search
Auger, David
[J]. APPLICATIONS OF EVOLUTIONARY COMPUTATION, PT I, 2011, 6624 : 53 - 62
[20] Converging to a Player Model In Monte-Carlo Tree Search
Sarratt, Trevor
Pynadath, David V.
Jhala, Arnav
[J]. 2014 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND GAMES (CIG), 2014,

← 1 2 3 4 5 →