Guided Policy Search via Approximate Mirror Descent

被引：0

作者：

Montgomery, William ^{[1
]}

Levine, Sergey ^{[1
]}

机构：

[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016) | 2016年 / 29卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a "teacher" algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy search methods provide asymptotic local convergence guarantees by construction, but it is not clear how much the policy improves within a small, finite number of iterations. We show that guided policy search algorithms can be interpreted as an approximate variant of mirror descent, where the projection onto the constraint manifold is not exact. We derive a new guided policy search algorithm that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and show that in the more general nonlinear setting, the error in the projection step can be bounded. We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.

引用

页数：9

共 50 条

[21] Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
Xu, Zhenghao
Ji, Xiang
Chen, Minshuo
Wang, Mengdi
Zhao, Tuo
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[22] ERGODIC MIRROR DESCENT
Duchi, John C.
Agarwal, Alekh
Johansson, Mikael
Jordan, Michael I.
SIAM JOURNAL ON OPTIMIZATION, 2012, 22 (04) : 1549 - 1578
[23] METRICAL TASK SYSTEMS ON TREES VIA MIRROR DESCENT AND UNFAIR GLUING
Bubeck, Sebastien
Cohen, Michael B.
Lee, James R.
Lee, Yin Tat
SIAM JOURNAL ON COMPUTING, 2021, 50 (03) : 909 - 923
[24] POLICY MIRROR DESCENT FOR REGULARIZED REINFORCEMENT LEARNING: A GENERALIZED FRAMEWORK WITH LINEAR CONVERGENCE
Zhan, Wenhao
Cen, Shicong
Huang, Baihe
Chen, Yuxin
Lee, Jason D.
Chi, Yuejie
SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (02) : 1061 - 1091
[25] Analyze Accelerated Mirror Descent via High-Resolution ODEs
Yuan, Ya-Xiang
Zhang, Yi
JOURNAL OF THE OPERATIONS RESEARCH SOCIETY OF CHINA, 2024,
[26] Policy-Guided Heuristic Search with Guarantees
Orseau, Laurent
Lelis, Levi H. S.
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 12382 - 12390
[27] Guided Policy Search for Sequential Multitask Learning
Xiong, Fangzhou
Sun, Biao
Yang, Xu
Qiao, Hong
Huang, Kaizhu
Hussain, Amir
Liu, Zhiyong
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2019, 49 (01): : 216 - 226
[28] A Linear Online Guided Policy Search Algorithm
Sun, Biao
Xiong, Fangzhou
Liu, Zhiyong
Yang, Xu
Qiao, Hong
NEURAL INFORMATION PROCESSING, ICONIP 2017, PT V, 2017, 10638 : 434 - 441
[29] Interpreting and Extending the Guided Filter via Cyclic Coordinate Descent
Dai, Longquan
Yuan, Mengke
Tang, Liang
Xie, Yuan
Zhang, Xiaopeng
Tang, Jinhui
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (02) : 767 - 778
[30] Multiuser detection in impulsive noise via slowest descent search
Spasojevic, P
Wang, XD
PROCEEDINGS OF THE TENTH IEEE WORKSHOP ON STATISTICAL SIGNAL AND ARRAY PROCESSING, 2000, : 146 - 150

← 1 2 3 4 5 →