Monte-Carlo Tree Search for Policy Optimization

被引:0
|
作者
Ma, Xiaobai [1 ]
Driggs-Campbell, Katherine [2 ]
Zhang, Zongzhang [3 ]
Kochenderfer, Mykel J. [1 ]
机构
[1] Stanford Univ, Dept Aeronaut & Astronaut, Stanford, CA 94305 USA
[2] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL USA
[3] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.
引用
收藏
页码:3116 / 3122
页数:7
相关论文
共 50 条
  • [31] Parallel Monte-Carlo Tree Search for HPC Systems
    Graf, Tobias
    Lorenz, Ulf
    Platzner, Marco
    Schaefers, Lars
    [J]. EURO-PAR 2011 PARALLEL PROCESSING, PT 2, 2011, 6853 : 365 - 376
  • [32] Can Monte-Carlo Tree Search learn to sacrifice?
    Companez, Nathan
    Aleti, Aldeida
    [J]. JOURNAL OF HEURISTICS, 2016, 22 (06) : 783 - 813
  • [33] Monte-Carlo Tree Search for the Game of Scotland Yard
    Nijssen, J. A. M.
    Winands, Mark H. M.
    [J]. 2011 IEEE CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND GAMES (CIG), 2011, : 158 - 165
  • [34] Monte-Carlo Tree Search for Scalable Coalition Formation
    Wu, Feng
    Ramchurn, Sarvapali D.
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 407 - 413
  • [35] Monte-Carlo Tree Search by Best Arm Identification
    Kaufmann, Emilie
    Koolen, Wouter M.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [36] EXPERIMENTS WITH MONTE-CARLO TREE SEARCH IN THE GAME OF HAVANNAH
    Lorentz, Richard J.
    [J]. ICGA JOURNAL, 2011, 34 (03) : 140 - 149
  • [37] Monte-Carlo Tree Search Parallelisation for Computer Go
    van Niekerk, Francois
    Kroon, Steve
    van Rooyen, Gert-Jan
    Inggs, Cornelia P.
    [J]. PROCEEDINGS OF THE SOUTH AFRICAN INSTITUTE FOR COMPUTER SCIENTISTS AND INFORMATION TECHNOLOGISTS CONFERENCE, 2012, : 129 - 138
  • [38] CROSS-ENTROPY FOR MONTE-CARLO TREE SEARCH
    Chaslot, Guillaume M. J. B.
    Winands, Mark H. M.
    Szita, Istvan
    van den Herik, H. Jaap
    [J]. ICGA JOURNAL, 2008, 31 (03) : 145 - 156
  • [39] Can Monte-Carlo Tree Search learn to sacrifice?
    Nathan Companez
    Aldeida Aleti
    [J]. Journal of Heuristics, 2016, 22 : 783 - 813
  • [40] Backpropagation Modification in Monte-Carlo Game Tree Search
    Xie, Fan
    Liu, Zhiqing
    [J]. 2009 THIRD INTERNATIONAL SYMPOSIUM ON INTELLIGENT INFORMATION TECHNOLOGY APPLICATION, VOL 2, PROCEEDINGS, 2009, : 125 - 128