Non-Asymptotic Analysis of Monte Carlo Tree Search

被引:3
|
作者
Shah D. [1 ]
Xie Q. [2 ]
Xu Z. [1 ]
机构
[1] LIDS, MIT, Cambridge, MA
[2] ORIE, Cornell University, Ithaca, NY
来源
Performance Evaluation Review | 2020年 / 48卷 / 01期
关键词
D O I
10.1145/3393691.3394202
中图分类号
学科分类号
摘要
In this work, we consider the popular tree-based search strategy within the framework of reinforcement learning, the Monte Carlo Tree Search (MCTS), in the context of infinite-horizon discounted cost Markov Decision Process (MDP) with deterministic transitions. While MCTS is believed to provide an approximate value function for a given state with enough simulations, cf. [5, 6], the claimed proof of this property is incomplete. This is due to the fact that the variant of MCTS, the Upper Confidence Bound for Trees (UCT), analyzed in prior works utilizes "logarithmic"bonus term for balancing exploration and exploitation within the tree-based search, following the insights from stochastic multi-arm bandit (MAB) literature, cf. [1, 3]. In effect, such an approach assumes that the regret of the underlying recursively dependent non-stationary MABs concentrates around their mean exponentially in the number of steps, which is unlikely to hold as pointed out in [2], even for stationary MABs. © 2020 Copyright is held by the owner/author(s).
引用
收藏
页码:31 / 32
页数:1
相关论文
共 50 条
  • [31] Time Management for Monte Carlo Tree Search
    Baier, Hendrik
    Winands, Mark H. M.
    IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2016, 8 (03) : 301 - 314
  • [32] Parallel Monte-Carlo Tree Search
    Chaslot, Guillaume M. J. -B.
    Winands, Mark H. M.
    van den Herik, H. Jaap
    COMPUTERS AND GAMES, 2008, 5131 : 60 - +
  • [33] Parallel Monte Carlo Tree Search on GPU
    Rocki, Kamil
    Suda, Reiji
    ELEVENTH SCANDINAVIAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (SCAI 2011), 2011, 227 : 80 - 89
  • [34] Monte Carlo Tree Search in Lines of Action
    Winands, Mark H. M.
    Bjornsson, Yngvi
    Saito, Jahn-Takeshi
    IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2010, 2 (04) : 239 - 250
  • [35] Monte-Carlo Tree Search Solver
    Winands, Mark H. M.
    Bjornsson, Yngvi
    Saito, Jahn-Takeshi
    COMPUTERS AND GAMES, 2008, 5131 : 25 - +
  • [36] Text Matching with Monte Carlo Tree Search
    He, Yixuan
    Tao, Shuchang
    Xu, Jun
    Guo, Jiafeng
    Lan, YanYan
    Cheng, Xueqi
    INFORMATION RETRIEVAL, CCIR 2018, 2018, 11168 : 41 - 52
  • [37] Monte Carlo Tree Search with Boltzmann Exploration
    Painter, Michael
    Baioumy, Mohamed
    Hawes, Nick
    Lacerda, Bruno
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [38] Classification of Monte Carlo Tree Search Variants
    McGuinness, Cameron
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 357 - 363
  • [39] Non-asymptotic error bounds for the multilevel Monte Carlo Euler method applied to SDEs with constant diffusion coefficient
    Jourdain, Benjamin
    Kebaier, Ahmed
    ELECTRONIC JOURNAL OF PROBABILITY, 2019, 24
  • [40] Monte Carlo Tree Search for Generating Interactive Data Analysis Interfaces
    Chen, Yiru
    SIGMOD'20: PROCEEDINGS OF THE 2020 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2020, : 2837 - 2839