Monte-Carlo Tree Search for Policy Optimization

被引：0

作者：

Ma, Xiaobai ^{[1
]}

Driggs-Campbell, Katherine ^{[2
]}

Zhang, Zongzhang ^{[3
]}

Kochenderfer, Mykel J. ^{[1
]}

机构：

[1] Stanford Univ, Dept Aeronaut & Astronaut, Stanford, CA 94305 USA

[2] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL USA

[3] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.

引用

页码：3116 / 3122

页数：7

共 50 条

[41] Monte-Carlo tree search for Bayesian reinforcement learning
Ngo Anh Vien
Ertel, Wolfgang
Viet-Hung Dang
Chung, TaeChoong
[J]. APPLIED INTELLIGENCE, 2013, 39 (02) : 345 - 353
[42] Using evaluation functions in Monte-Carlo Tree Search
Lorentz, Richard
[J]. THEORETICAL COMPUTER SCIENCE, 2016, 644 : 106 - 113
[43] Monte-Carlo tree search for Bayesian reinforcement learning
Ngo Anh Vien
Wolfgang Ertel
Viet-Hung Dang
TaeChoong Chung
[J]. Applied Intelligence, 2013, 39 : 345 - 353
[44] Single-Player Monte-Carlo Tree Search
Schadd, Maarten P. D.
Winands, Mark H. M.
van den Herik, H. Jaap
Chaslot, Guillaume M. J. -B.
Uiterwijk, Jos W. H. M.
[J]. COMPUTERS AND GAMES, 2008, 5131 : 1 - +
[45] Monte-Carlo Tree Search in Dragline Operation Planning
Liu, Haoquan
Austin, Kevin
Forbes, Michael
Kearney, Michael
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (01): : 419 - 425
[46] Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning
Graf, Tobias
Platzner, Marco
[J]. ADVANCES IN COMPUTER GAMES, ACG 2015, 2015, 9525 : 1 - 11
[47] From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning
Munos, Remi
[J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2014, 7 (01): : 1 - 129
[48] Multiple Policy Value Monte Carlo Tree Search
Lan, Li-Cheng
Li, Wei
Wei, Ting-Han
Wu, I-Chen
[J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4704 - 4710
[49] Efficient selectivity and backup operators in Monte-Carlo tree search
Coulom, Remi
[J]. COMPUTERS AND GAMES, 2007, 4630 : 72 - 83
[50] Monte-Carlo tree search for stable structures of planar clusters
He Chang-Chun
Liao Ji-Hai
Yang Xiao-Bao
[J]. ACTA PHYSICA SINICA, 2017, 66 (16)

← 1 2 3 4 5 →