Guided Policy Search via Approximate Mirror Descent

被引：0

作者：

Montgomery, William ^{[1
]}

Levine, Sergey ^{[1
]}

机构：

[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016) | 2016年 / 29卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a "teacher" algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy search methods provide asymptotic local convergence guarantees by construction, but it is not clear how much the policy improves within a small, finite number of iterations. We show that guided policy search algorithms can be interpreted as an approximate variant of mirror descent, where the projection onto the constraint manifold is not exact. We derive a new guided policy search algorithm that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and show that in the more general nonlinear setting, the error in the projection step can be bounded. We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.

引用

页数：9

共 50 条

[1] On the Residual of Mirror Descent Search and Scalability via Dimensionality Reduction
Murata, Yuuki
Miyashita, Megumi
Yano, Shiro
Kondo, Toshiyuki
2018 SEVENTH ICT INTERNATIONAL STUDENT PROJECT CONFERENCE (ICT-ISPC), 2018, : 152 - 157
[2] BLOCK POLICY MIRROR DESCENT
Lan, Guanghui
Li, Yan
Zhao, Tuo
SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (03) : 2341 - 2378
[3] Mirror descent search and its acceleration
Miyashita, Megumi
Yano, Shiro
Kondo, Toshiyuki
ROBOTICS AND AUTONOMOUS SYSTEMS, 2018, 106 : 107 - 116
[4] Policy Optimization with Stochastic Mirror Descent
Yang, Long
Zhang, Yu
Zheng, Gang
Zheng, Qian
Li, Pengfei
Huang, Jianghang
Pan, Gang
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8823 - 8831
[5] Numeric Planning via Abstraction and Policy Guided Search
Illanes, Leon
Mcllraith, Sheila A.
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4338 - 4345
[6] POLICY MIRROR DESCENT INHERENTLY EXPLORES ACTION SPACE
Li, Yan
Lan, Guanghui
SIAM JOURNAL ON OPTIMIZATION, 2025, 35 (01) : 116 - 156
[7] No-regret Caching via Online Mirror Descent
Salem, Tareq Si
Neglia, Giovanni
Ioannidis, Stratis
ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 2023, 8 (04)
[8] Stochastic Approximate Gradient Descent via the Langevin Algorithm
Qiu, Yixuan
Wang, Xiao
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5428 - 5435
[9] No-Regret Caching via Online Mirror Descent
Salem, Tareq Si
Neglia, Giovanni
Ioannidis, Straus
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
[10] Provable Bayesian Inference via Particle Mirror Descent
Dai, Bo
He, Niao
Dai, Hanjun
Song, Le
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 985 - 994

← 1 2 3 4 5 →