Monte-Carlo Tree Search for Policy Optimization

被引：0

作者：

Ma, Xiaobai ^{[1
]}

Driggs-Campbell, Katherine ^{[2
]}

Zhang, Zongzhang ^{[3
]}

Kochenderfer, Mykel J. ^{[1
]}

机构：

[1] Stanford Univ, Dept Aeronaut & Astronaut, Stanford, CA 94305 USA

[2] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL USA

[3] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.

引用

页码：3116 / 3122

页数：7

共 50 条

[1] Monte-Carlo tree search as regularized policy optimization
Grill, Jean-Bastien
Altche, Florent
Tang, Yunhao
Hubert, Thomas
Valko, Michal
Antonoglou, Ioannis
Munos, Remi
[J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
[2] Monte-Carlo tree search as regularized policy optimization
Grill, Jean-Bastien
Altche, Florent
Tang, Yunhao
Hubert, Thomas
Valko, Michal
Antonoglou, Ioannis
Munos, Remi
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
[3] Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search
Lim, Nengli
Li, Yueqin
[J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 209 - 221
[4] Monte-Carlo Tree Search for Logistics
Edelkamp, Stefan
Gath, Max
Greulich, Christoph
Humann, Malte
Herzog, Otthein
Lawo, Michael
[J]. COMMERCIAL TRANSPORT, 2016, : 427 - 440
[5] Parallel Monte-Carlo Tree Search
Chaslot, Guillaume M. J. -B.
Winands, Mark H. M.
van den Herik, H. Jaap
[J]. COMPUTERS AND GAMES, 2008, 5131 : 60 - +
[6] Monte-Carlo Tree Search Solver
Winands, Mark H. M.
Bjornsson, Yngvi
Saito, Jahn-Takeshi
[J]. COMPUTERS AND GAMES, 2008, 5131 : 25 - +
[7] Monte-Carlo Swarm Policy Search
Fix, Jeremy
Geist, Matthieu
[J]. SWARM AND EVOLUTIONARY COMPUTATION, 2012, 7269 : 75 - 83
[8] Monte-Carlo Tree Search with Tree Shape Control
Marchenko, Oleksandr I.
Marchenko, Oleksii O.
[J]. 2017 IEEE FIRST UKRAINE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (UKRCON), 2017, : 812 - 817
[9] Monte-Carlo Tree Search for Constrained POMDPs
Lee, Jongmin
Kim, Geon-Hyeong
Poupart, Pascal
Kim, Kee-Eung
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[10] Monte-Carlo Tree Search in Settlers of Catan
Szita, Istvan
Chaslot, Guillaume
Spronck, Pieter
[J]. ADVANCES IN COMPUTER GAMES, 2010, 6048 : 21 - +

← 1 2 3 4 5 →