Guided Policy Search via Approximate Mirror Descent

被引:0
|
作者
Montgomery, William [1 ]
Levine, Sergey [1 ]
机构
[1] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a "teacher" algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy search methods provide asymptotic local convergence guarantees by construction, but it is not clear how much the policy improves within a small, finite number of iterations. We show that guided policy search algorithms can be interpreted as an approximate variant of mirror descent, where the projection onto the constraint manifold is not exact. We derive a new guided policy search algorithm that is simpler and provides appealing improvement and convergence guarantees in simplified convex and linear settings, and show that in the more general nonlinear setting, the error in the projection step can be bounded. We provide empirical results on several simulated robotic navigation and manipulation tasks that show that our method is stable and achieves similar or better performance when compared to prior guided policy search methods, with a simpler formulation and fewer hyperparameters.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] On the Residual of Mirror Descent Search and Scalability via Dimensionality Reduction
    Murata, Yuuki
    Miyashita, Megumi
    Yano, Shiro
    Kondo, Toshiyuki
    2018 SEVENTH ICT INTERNATIONAL STUDENT PROJECT CONFERENCE (ICT-ISPC), 2018, : 152 - 157
  • [2] BLOCK POLICY MIRROR DESCENT
    Lan, Guanghui
    Li, Yan
    Zhao, Tuo
    SIAM JOURNAL ON OPTIMIZATION, 2023, 33 (03) : 2341 - 2378
  • [3] Mirror descent search and its acceleration
    Miyashita, Megumi
    Yano, Shiro
    Kondo, Toshiyuki
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2018, 106 : 107 - 116
  • [4] Policy Optimization with Stochastic Mirror Descent
    Yang, Long
    Zhang, Yu
    Zheng, Gang
    Zheng, Qian
    Li, Pengfei
    Huang, Jianghang
    Pan, Gang
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 8823 - 8831
  • [5] Numeric Planning via Abstraction and Policy Guided Search
    Illanes, Leon
    Mcllraith, Sheila A.
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 4338 - 4345
  • [6] POLICY MIRROR DESCENT INHERENTLY EXPLORES ACTION SPACE
    Li, Yan
    Lan, Guanghui
    SIAM JOURNAL ON OPTIMIZATION, 2025, 35 (01) : 116 - 156
  • [7] No-regret Caching via Online Mirror Descent
    Salem, Tareq Si
    Neglia, Giovanni
    Ioannidis, Stratis
    ACM TRANSACTIONS ON MODELING AND PERFORMANCE EVALUATION OF COMPUTING SYSTEMS, 2023, 8 (04)
  • [8] Stochastic Approximate Gradient Descent via the Langevin Algorithm
    Qiu, Yixuan
    Wang, Xiao
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5428 - 5435
  • [9] No-Regret Caching via Online Mirror Descent
    Salem, Tareq Si
    Neglia, Giovanni
    Ioannidis, Straus
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [10] Provable Bayesian Inference via Particle Mirror Descent
    Dai, Bo
    He, Niao
    Dai, Hanjun
    Song, Le
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 985 - 994