Monte-Carlo tree search as regularized policy optimization

被引：0

作者：

Grill, Jean-Bastien ^{[1
]}

Altche, Florent ^{[1
]}

Tang, Yunhao ^{[1
,2
]}

Hubert, Thomas ^{[3
]}

Valko, Michal ^{[1
]}

Antonoglou, Ioannis ^{[3
]}

Munos, Remi ^{[1
]}

机构：

[1] DeepMind, Paris, France

[2] Columbia Univ, New York, NY USA

[3] DeepMind, London, England

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119 | 2020年 / 119卷

关键词：

GAME; GO;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.

引用

页数：10

共 50 条

[1] Monte-Carlo tree search as regularized policy optimization
Grill, Jean-Bastien
Altche, Florent
Tang, Yunhao
Hubert, Thomas
Valko, Michal
Antonoglou, Ioannis
Munos, Remi
[J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[2] Monte-Carlo Tree Search for Policy Optimization
Ma, Xiaobai
Driggs-Campbell, Katherine
Zhang, Zongzhang
Kochenderfer, Mykel J.
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3116 - 3122
[3] Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search
Lim, Nengli
Li, Yueqin
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 209 - 221
[4] Monte-Carlo Tree Search for Logistics
Edelkamp, Stefan
Gath, Max
Greulich, Christoph
Humann, Malte
Herzog, Otthein
Lawo, Michael
[J]. COMMERCIAL TRANSPORT, 2016, : 427 - 440
[5] Parallel Monte-Carlo Tree Search
Chaslot, Guillaume M. J. -B.
Winands, Mark H. M.
van den Herik, H. Jaap
[J]. COMPUTERS AND GAMES, 2008, 5131 : 60 - +
[6] Monte-Carlo Tree Search Solver
Winands, Mark H. M.
Bjornsson, Yngvi
Saito, Jahn-Takeshi
[J]. COMPUTERS AND GAMES, 2008, 5131 : 25 - +
[7] Monte-Carlo Swarm Policy Search
Fix, Jeremy
Geist, Matthieu
[J]. SWARM AND EVOLUTIONARY COMPUTATION, 2012, 7269 : 75 - 83
[8] Monte-Carlo Tree Search with Tree Shape Control
Marchenko, Oleksandr I.
Marchenko, Oleksii O.
[J]. 2017 IEEE FIRST UKRAINE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (UKRCON), 2017, : 812 - 817
[9] Monte-Carlo Tree Search: To MC or to DP?
Feldman, Zohar
Domshlak, Carmel
[J]. 21ST EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2014), 2014, 263 : 321 - 326
[10] Monte-Carlo Tree Search for Constrained POMDPs
Lee, Jongmin
Kim, Geon-Hyeong
Poupart, Pascal
Kim, Kee-Eung
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

← 1 2 3 4 5 →