Monte-Carlo Tree Search for Policy Optimization

被引:0
|
作者
Ma, Xiaobai [1 ]
Driggs-Campbell, Katherine [2 ]
Zhang, Zongzhang [3 ]
Kochenderfer, Mykel J. [1 ]
机构
[1] Stanford Univ, Dept Aeronaut & Astronaut, Stanford, CA 94305 USA
[2] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL USA
[3] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.
引用
收藏
页码:3116 / 3122
页数:7
相关论文
共 50 条
  • [41] Monte-Carlo tree search for Bayesian reinforcement learning
    Ngo Anh Vien
    Ertel, Wolfgang
    Viet-Hung Dang
    Chung, TaeChoong
    [J]. APPLIED INTELLIGENCE, 2013, 39 (02) : 345 - 353
  • [42] Using evaluation functions in Monte-Carlo Tree Search
    Lorentz, Richard
    [J]. THEORETICAL COMPUTER SCIENCE, 2016, 644 : 106 - 113
  • [43] Monte-Carlo tree search for Bayesian reinforcement learning
    Ngo Anh Vien
    Wolfgang Ertel
    Viet-Hung Dang
    TaeChoong Chung
    [J]. Applied Intelligence, 2013, 39 : 345 - 353
  • [44] Single-Player Monte-Carlo Tree Search
    Schadd, Maarten P. D.
    Winands, Mark H. M.
    van den Herik, H. Jaap
    Chaslot, Guillaume M. J. -B.
    Uiterwijk, Jos W. H. M.
    [J]. COMPUTERS AND GAMES, 2008, 5131 : 1 - +
  • [45] Monte-Carlo Tree Search in Dragline Operation Planning
    Liu, Haoquan
    Austin, Kevin
    Forbes, Michael
    Kearney, Michael
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (01): : 419 - 425
  • [46] Adaptive Playouts in Monte-Carlo Tree Search with Policy-Gradient Reinforcement Learning
    Graf, Tobias
    Platzner, Marco
    [J]. ADVANCES IN COMPUTER GAMES, ACG 2015, 2015, 9525 : 1 - 11
  • [47] From Bandits to Monte-Carlo Tree Search: The Optimistic Principle Applied to Optimization and Planning
    Munos, Remi
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2014, 7 (01): : 1 - 129
  • [48] Multiple Policy Value Monte Carlo Tree Search
    Lan, Li-Cheng
    Li, Wei
    Wei, Ting-Han
    Wu, I-Chen
    [J]. PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 4704 - 4710
  • [49] Efficient selectivity and backup operators in Monte-Carlo tree search
    Coulom, Remi
    [J]. COMPUTERS AND GAMES, 2007, 4630 : 72 - 83
  • [50] Monte-Carlo tree search for stable structures of planar clusters
    He Chang-Chun
    Liao Ji-Hai
    Yang Xiao-Bao
    [J]. ACTA PHYSICA SINICA, 2017, 66 (16)