Guided Policy Search via Approximate Mirror Descent

被引:0
|
作者
Montgomery, William [1 ]
Levine, Sergey [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016) | 2016年 / 29卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a "teacher" algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy search methods provide asymptotic local convergence guarantees by construction, but it is not clear how much the policy improves within a small, finite number of iterations. We show that guided policy search algorithms can be interpreted as an approximate variant of mirror descent, where the projection onto the constraint manifold is not exact. We derive a new guided policy search algorithm that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and show that in the more general nonlinear setting, the error in the projection step can be bounded. We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] Multiuser detection in impulsive noise via slowest descent search
    Spasojevic, Predrag
    Wang, Xiaodong
    IEEE Signal Processing Workshop on Statistical Signal and Array Processing, SSAP, 2000, : 146 - 150
  • [32] Approximate Newton methods for policy search in markov decision processes
    Furmston, Thomas
    Lever, Guy
    Barber, David
    Journal of Machine Learning Research, 2016, 17 : 1 - 51
  • [33] Approximate Bayes Optimal Policy Search using Neural Networks
    Castronovo, Michael
    Francois-Lavet, Vincent
    Fonteneau, Raphael
    Ernst, Damien
    Couetoux, Adrien
    ICAART: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2017, : 142 - 153
  • [34] Approximate Newton Methods for Policy Search in Markov Decision Processes
    Furmston, Thomas
    Lever, Guy
    Barber, David
    JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
  • [35] Distributed Algorithms for Multicommodity Flow Problems via Approximate Steepest Descent Framework
    Awerbuch, Baruch
    Khandekar, Rohit
    Rao, Satish
    PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2007, : 949 - +
  • [36] Distributed Algorithms for Multicommodity Flow Problems via Approximate Steepest Descent Framework
    Awerbuch, Baruch
    Khandekar, Rohit
    Rao, Satish
    ACM TRANSACTIONS ON ALGORITHMS, 2012, 9 (01)
  • [37] Joint Online Learning and Decision-making via Dual Mirror Descent
    Lobos, Alfonso
    Grigas, Paul
    Wen, Zheng
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [38] Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes
    Johnson, Emmeran
    Pike-Burke, Ciara
    Rebeschini, Patrick
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [39] The Information Geometry of Mirror Descent
    Raskutti, Garvesh
    Mukherjee, Sayan
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2015, 61 (03) : 1451 - 1457
  • [40] Hessian Informed Mirror Descent
    Wang, Li
    Yan, Ming
    JOURNAL OF SCIENTIFIC COMPUTING, 2022, 92 (03)