Monte-Carlo tree search as regularized policy optimization

被引:0
|
作者
Grill, Jean-Bastien [1 ]
Altche, Florent [1 ]
Tang, Yunhao [1 ,2 ]
Hubert, Thomas [3 ]
Valko, Michal [1 ]
Antonoglou, Ioannis [3 ]
Munos, Remi [1 ]
机构
[1] DeepMind, Paris, France
[2] Columbia Univ, New York, NY USA
[3] DeepMind, London, England
关键词
GAME; GO;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Monte-Carlo tree search as regularized policy optimization
    Grill, Jean-Bastien
    Altche, Florent
    Tang, Yunhao
    Hubert, Thomas
    Valko, Michal
    Antonoglou, Ioannis
    Munos, Remi
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [2] Monte-Carlo Tree Search for Policy Optimization
    Ma, Xiaobai
    Driggs-Campbell, Katherine
    Zhang, Zongzhang
    Kochenderfer, Mykel J.
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3116 - 3122
  • [3] Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search
    Lim, Nengli
    Li, Yueqin
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 209 - 221
  • [4] Monte-Carlo Tree Search for Logistics
    Edelkamp, Stefan
    Gath, Max
    Greulich, Christoph
    Humann, Malte
    Herzog, Otthein
    Lawo, Michael
    [J]. COMMERCIAL TRANSPORT, 2016, : 427 - 440
  • [5] Parallel Monte-Carlo Tree Search
    Chaslot, Guillaume M. J. -B.
    Winands, Mark H. M.
    van den Herik, H. Jaap
    [J]. COMPUTERS AND GAMES, 2008, 5131 : 60 - +
  • [6] Monte-Carlo Tree Search Solver
    Winands, Mark H. M.
    Bjornsson, Yngvi
    Saito, Jahn-Takeshi
    [J]. COMPUTERS AND GAMES, 2008, 5131 : 25 - +
  • [7] Monte-Carlo Swarm Policy Search
    Fix, Jeremy
    Geist, Matthieu
    [J]. SWARM AND EVOLUTIONARY COMPUTATION, 2012, 7269 : 75 - 83
  • [8] Monte-Carlo Tree Search with Tree Shape Control
    Marchenko, Oleksandr I.
    Marchenko, Oleksii O.
    [J]. 2017 IEEE FIRST UKRAINE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (UKRCON), 2017, : 812 - 817
  • [9] Monte-Carlo Tree Search: To MC or to DP?
    Feldman, Zohar
    Domshlak, Carmel
    [J]. 21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 321 - 326
  • [10] Monte-Carlo Tree Search for Constrained POMDPs
    Lee, Jongmin
    Kim, Geon-Hyeong
    Poupart, Pascal
    Kim, Kee-Eung
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31