Guided Policy Search via Approximate Mirror Descent

被引:0
|
作者
Montgomery, William [1 ]
Levine, Sergey [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016) | 2016年 / 29卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a "teacher" algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy search methods provide asymptotic local convergence guarantees by construction, but it is not clear how much the policy improves within a small, finite number of iterations. We show that guided policy search algorithms can be interpreted as an approximate variant of mirror descent, where the projection onto the constraint manifold is not exact. We derive a new guided policy search algorithm that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and show that in the more general nonlinear setting, the error in the projection step can be bounded. We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
    Xu, Zhenghao
    Ji, Xiang
    Chen, Minshuo
    Wang, Mengdi
    Zhao, Tuo
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [22] ERGODIC MIRROR DESCENT
    Duchi, John C.
    Agarwal, Alekh
    Johansson, Mikael
    Jordan, Michael I.
    SIAM JOURNAL ON OPTIMIZATION, 2012, 22 (04) : 1549 - 1578
  • [23] METRICAL TASK SYSTEMS ON TREES VIA MIRROR DESCENT AND UNFAIR GLUING
    Bubeck, Sebastien
    Cohen, Michael B.
    Lee, James R.
    Lee, Yin Tat
    SIAM JOURNAL ON COMPUTING, 2021, 50 (03) : 909 - 923
  • [24] POLICY MIRROR DESCENT FOR REGULARIZED REINFORCEMENT LEARNING: A GENERALIZED FRAMEWORK WITH LINEAR CONVERGENCE
    Zhan, Wenhao
    Cen, Shicong
    Huang, Baihe
    Chen, Yuxin
    Lee, Jason D.
    Chi, Yuejie
    SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (02) : 1061 - 1091
  • [25] Analyze Accelerated Mirror Descent via High-Resolution ODEs
    Yuan, Ya-Xiang
    Zhang, Yi
    JOURNAL OF THE OPERATIONS RESEARCH SOCIETY OF CHINA, 2024,
  • [26] Policy-Guided Heuristic Search with Guarantees
    Orseau, Laurent
    Lelis, Levi H. S.
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12382 - 12390
  • [27] Guided Policy Search for Sequential Multitask Learning
    Xiong, Fangzhou
    Sun, Biao
    Yang, Xu
    Qiao, Hong
    Huang, Kaizhu
    Hussain, Amir
    Liu, Zhiyong
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2019, 49 (01): : 216 - 226
  • [28] A Linear Online Guided Policy Search Algorithm
    Sun, Biao
    Xiong, Fangzhou
    Liu, Zhiyong
    Yang, Xu
    Qiao, Hong
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT V, 2017, 10638 : 434 - 441
  • [29] Interpreting and Extending the Guided Filter via Cyclic Coordinate Descent
    Dai, Longquan
    Yuan, Mengke
    Tang, Liang
    Xie, Yuan
    Zhang, Xiaopeng
    Tang, Jinhui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 767 - 778
  • [30] Multiuser detection in impulsive noise via slowest descent search
    Spasojevic, P
    Wang, XD
    PROCEEDINGS OF THE TENTH IEEE WORKSHOP ON STATISTICAL SIGNAL AND ARRAY PROCESSING, 2000, : 146 - 150