Guided Policy Search via Approximate Mirror Descent

被引：0

作者：

Montgomery, William ^{[1
]}

Levine, Sergey ^{[1
]}

机构：

[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016) | 2016年 / 29卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a "teacher" algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy search methods provide asymptotic local convergence guarantees by construction, but it is not clear how much the policy improves within a small, finite number of iterations. We show that guided policy search algorithms can be interpreted as an approximate variant of mirror descent, where the projection onto the constraint manifold is not exact. We derive a new guided policy search algorithm that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and show that in the more general nonlinear setting, the error in the projection step can be bounded. We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.

引用

页数：9

共 50 条

[31] Multiuser detection in impulsive noise via slowest descent search
Spasojevic, Predrag
Wang, Xiaodong
IEEE Signal Processing Workshop on Statistical Signal and Array Processing, SSAP, 2000, : 146 - 150
[32] Approximate Newton methods for policy search in markov decision processes
Furmston, Thomas
Lever, Guy
Barber, David
Journal of Machine Learning Research, 2016, 17 : 1 - 51
[33] Approximate Bayes Optimal Policy Search using Neural Networks
Castronovo, Michael
Francois-Lavet, Vincent
Fonteneau, Raphael
Ernst, Damien
Couetoux, Adrien
ICAART: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2017, : 142 - 153
[34] Approximate Newton Methods for Policy Search in Markov Decision Processes
Furmston, Thomas
Lever, Guy
Barber, David
JOURNAL OF MACHINE LEARNING RESEARCH, 2016, 17
[35] Distributed Algorithms for Multicommodity Flow Problems via Approximate Steepest Descent Framework
Awerbuch, Baruch
Khandekar, Rohit
Rao, Satish
PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2007, : 949 - +
[36] Distributed Algorithms for Multicommodity Flow Problems via Approximate Steepest Descent Framework
Awerbuch, Baruch
Khandekar, Rohit
Rao, Satish
ACM TRANSACTIONS ON ALGORITHMS, 2012, 9 (01)
[37] Joint Online Learning and Decision-making via Dual Mirror Descent
Lobos, Alfonso
Grigas, Paul
Wen, Zheng
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[38] Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes
Johnson, Emmeran
Pike-Burke, Ciara
Rebeschini, Patrick
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[39] The Information Geometry of Mirror Descent
Raskutti, Garvesh
Mukherjee, Sayan
IEEE TRANSACTIONS ON INFORMATION THEORY, 2015, 61 (03) : 1451 - 1457
[40] Hessian Informed Mirror Descent
Wang, Li
Yan, Ming
JOURNAL OF SCIENTIFIC COMPUTING, 2022, 92 (03)

← 1 2 3 4 5 →