Non-Asymptotic Analysis of Monte Carlo Tree Search

被引：3

作者：

Shah D. ^{[1
]}

Xie Q. ^{[2
]}

Xu Z. ^{[1
]}

机构：

[1] LIDS, MIT, Cambridge, MA

[2] ORIE, Cornell University, Ithaca, NY

来源：

Performance Evaluation Review | 2020年 / 48卷 / 01期

关键词：

D O I：

10.1145/3393691.3394202

中图分类号：

学科分类号：

摘要：

In this work, we consider the popular tree-based search strategy within the framework of reinforcement learning, the Monte Carlo Tree Search (MCTS), in the context of infinite-horizon discounted cost Markov Decision Process (MDP) with deterministic transitions. While MCTS is believed to provide an approximate value function for a given state with enough simulations, cf. [5, 6], the claimed proof of this property is incomplete. This is due to the fact that the variant of MCTS, the Upper Confidence Bound for Trees (UCT), analyzed in prior works utilizes "logarithmic"bonus term for balancing exploration and exploitation within the tree-based search, following the insights from stochastic multi-arm bandit (MAB) literature, cf. [1, 3]. In effect, such an approach assumes that the regret of the underlying recursively dependent non-stationary MABs concentrates around their mean exponentially in the number of steps, which is unlikely to hold as pointed out in [2], even for stationary MABs. © 2020 Copyright is held by the owner/author(s).

引用

页码：31 / 32

页数：1

共 50 条

[31] Time Management for Monte Carlo Tree Search
Baier, Hendrik
Winands, Mark H. M.
IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2016, 8 (03) : 301 - 314
[32] Parallel Monte-Carlo Tree Search
Chaslot, Guillaume M. J. -B.
Winands, Mark H. M.
van den Herik, H. Jaap
COMPUTERS AND GAMES, 2008, 5131 : 60 - +
[33] Parallel Monte Carlo Tree Search on GPU
Rocki, Kamil
Suda, Reiji
ELEVENTH SCANDINAVIAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (SCAI 2011), 2011, 227 : 80 - 89
[34] Monte Carlo Tree Search in Lines of Action
Winands, Mark H. M.
Bjornsson, Yngvi
Saito, Jahn-Takeshi
IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2010, 2 (04) : 239 - 250
[35] Monte-Carlo Tree Search Solver
Winands, Mark H. M.
Bjornsson, Yngvi
Saito, Jahn-Takeshi
COMPUTERS AND GAMES, 2008, 5131 : 25 - +
[36] Text Matching with Monte Carlo Tree Search
He, Yixuan
Tao, Shuchang
Xu, Jun
Guo, Jiafeng
Lan, YanYan
Cheng, Xueqi
INFORMATION RETRIEVAL, CCIR 2018, 2018, 11168 : 41 - 52
[37] Monte Carlo Tree Search with Boltzmann Exploration
Painter, Michael
Baioumy, Mohamed
Hawes, Nick
Lacerda, Bruno
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[38] Classification of Monte Carlo Tree Search Variants
McGuinness, Cameron
2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 357 - 363
[39] Non-asymptotic error bounds for the multilevel Monte Carlo Euler method applied to SDEs with constant diffusion coefficient
Jourdain, Benjamin
Kebaier, Ahmed
ELECTRONIC JOURNAL OF PROBABILITY, 2019, 24
[40] Monte Carlo Tree Search for Generating Interactive Data Analysis Interfaces
Chen, Yiru
SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 2837 - 2839

← 1 2 3 4 5 →