Monte-Carlo Tree Search for Policy Optimization

被引:0
|
作者
Ma, Xiaobai [1 ]
Driggs-Campbell, Katherine [2 ]
Zhang, Zongzhang [3 ]
Kochenderfer, Mykel J. [1 ]
机构
[1] Stanford Univ, Dept Aeronaut & Astronaut, Stanford, CA 94305 USA
[2] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL USA
[3] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gradient-based methods are often used for policy optimization in deep reinforcement learning, despite being vulnerable to local optima and saddle points. Although gradient-free methods (e.g., genetic algorithms or evolution strategies) help mitigate these issues, poor initialization and local optima are still concerns in highly nonconvex spaces. This paper presents a method for policy optimization based on Monte-Carlo tree search and gradient-free optimization. Our method, called Monte-Carlo tree search for policy optimization (MCTSPO), provides a better exploration-exploitation trade-off through the use of the upper confidence bound heuristic. We demonstrate improved performance on reinforcement learning tasks with deceptive or sparse reward functions compared to popular gradient-based and deep genetic algorithm baselines.
引用
收藏
页码:3116 / 3122
页数:7
相关论文
共 50 条
  • [1] Monte-Carlo tree search as regularized policy optimization
    Grill, Jean-Bastien
    Altche, Florent
    Tang, Yunhao
    Hubert, Thomas
    Valko, Michal
    Antonoglou, Ioannis
    Munos, Remi
    [J]. 25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [2] Monte-Carlo tree search as regularized policy optimization
    Grill, Jean-Bastien
    Altche, Florent
    Tang, Yunhao
    Hubert, Thomas
    Valko, Michal
    Antonoglou, Ioannis
    Munos, Remi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [3] Bayesian Optimization for Backpropagation in Monte-Carlo Tree Search
    Lim, Nengli
    Li, Yueqin
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2021, PT II, 2021, 12892 : 209 - 221
  • [4] Monte-Carlo Tree Search for Logistics
    Edelkamp, Stefan
    Gath, Max
    Greulich, Christoph
    Humann, Malte
    Herzog, Otthein
    Lawo, Michael
    [J]. COMMERCIAL TRANSPORT, 2016, : 427 - 440
  • [5] Parallel Monte-Carlo Tree Search
    Chaslot, Guillaume M. J. -B.
    Winands, Mark H. M.
    van den Herik, H. Jaap
    [J]. COMPUTERS AND GAMES, 2008, 5131 : 60 - +
  • [6] Monte-Carlo Tree Search Solver
    Winands, Mark H. M.
    Bjornsson, Yngvi
    Saito, Jahn-Takeshi
    [J]. COMPUTERS AND GAMES, 2008, 5131 : 25 - +
  • [7] Monte-Carlo Swarm Policy Search
    Fix, Jeremy
    Geist, Matthieu
    [J]. SWARM AND EVOLUTIONARY COMPUTATION, 2012, 7269 : 75 - 83
  • [8] Monte-Carlo Tree Search with Tree Shape Control
    Marchenko, Oleksandr I.
    Marchenko, Oleksii O.
    [J]. 2017 IEEE FIRST UKRAINE CONFERENCE ON ELECTRICAL AND COMPUTER ENGINEERING (UKRCON), 2017, : 812 - 817
  • [9] Monte-Carlo Tree Search for Constrained POMDPs
    Lee, Jongmin
    Kim, Geon-Hyeong
    Poupart, Pascal
    Kim, Kee-Eung
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [10] Monte-Carlo Tree Search in Settlers of Catan
    Szita, Istvan
    Chaslot, Guillaume
    Spronck, Pieter
    [J]. ADVANCES IN COMPUTER GAMES, 2010, 6048 : 21 - +