Model-Free Trajectory-based Policy Optimization with Monotonic Improvement

被引:0
|
作者
Akrour, Riad [1 ]
Abdolmaleki, Abbas [2 ]
Abdulsamad, Hany [1 ]
Peters, Jan [1 ,3 ]
Neumann, Gerhard [1 ,4 ]
机构
[1] Tech Univ Darmstadt, CLAS IAS, Hsch Str 10, D-64289 Darmstadt, Germany
[2] DeepMind, London N1C 4AG, England
[3] Max Planck Inst Intelligent Syst, Max Planck Ring 4, Tubingen, Germany
[4] Univ Lincoln, L CAS, Lincoln LN6 7TS, England
基金
欧盟地平线“2020”;
关键词
Reinforcement Learning; Policy Optimization; Trajectory Optimization; Robotics;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Many of the recent trajectory optimization algorithms alternate between linear approximation of the system dynamics around the mean trajectory and conservative policy update. One way of constraining the policy change is by bounding the Kullback-Leibler (KL) divergence between successive policies. These approaches already demonstrated great experimental success in challenging problems such as end-to-end control of physical systems. However, the linear approximation of the system dynamics can introduce a bias in the policy update and prevent convergence to the optimal policy. In this article, we propose a new model-free trajectory-based policy optimization algorithm with guaranteed monotonic improvement. The algorithm backpropagates a local, quadratic and time-dependent Q-Function learned from trajectory data instead of a model of the system dynamics. Our policy update ensures exact KL-constraint satisfaction without simplifying assumptions on the system dynamics. We experimentally demonstrate on highly non-linear control tasks the improvement in performance of our algorithm in comparison to approaches linearizing the system dynamics. In order to show the monotonic improvement of our algorithm, we additionally conduct a theoretical analysis of our policy update scheme to derive a lower bound of the change in policy return between successive iterations.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] ON THE USE OF AN SPSA-BASED MODEL-FREE CONTROLLER IN QUALITY IMPROVEMENT
    REZAYAT, F
    AUTOMATICA, 1995, 31 (06) : 913 - 915
  • [32] Cascaded Model-Free Control for trajectory tracking of quadrotors
    Bekcheva, Maria
    Join, Cedric
    Mounier, Hugues
    2018 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS (ICUAS), 2018, : 1359 - 1368
  • [33] A Trajectory-based Attention Model for Sequential Impurity Detection
    He, Wenhao
    Song, Haitao
    Guo, Yue
    Wang, Xiaonan
    Bian, Guibin
    Yuan, Kui
    NEUROCOMPUTING, 2020, 410 : 271 - 283
  • [34] Adaptive UAV-Trajectory Optimization Under Quality of Service Constraints: A Model-Free Solution
    Cui, Jingjing
    Ding, Zhiguo
    Deng, Yansha
    Nallanathan, Arumugam
    Hanzo, Lajos
    IEEE ACCESS, 2020, 8 : 112253 - 112265
  • [35] IMPROVEMENT OF TRACKING PERFORMANCE IN MODEL-FREE ADAPTIVE CONTROLLER BASED ON MULTI-INNOVATION AND PARTICLE SWARM OPTIMIZATION
    Qin, Pinle
    Lin, Yan
    Chen, Ming
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2009, 5 (05): : 1367 - 1377
  • [36] Combining Model-Based and Model-Free Updates for Trajectory-Centric Reinforcement Learning
    Chebotar, Yevgen
    Hausman, Karol
    Zhang, Marvin
    Sukhatme, Gaurav
    Schaal, Stefan
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [37] Model-free Policy Learning with Reward Gradients
    Lan, Qingfong
    Tosatto, Samuele
    Farrahi, Homayoon
    Mahmood, A. Rupam
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [38] Model-free least squares policy iteration
    Lagoudakis, MG
    Parr, R
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 14, VOLS 1 AND 2, 2002, 14 : 1547 - 1554
  • [39] Reinforcement learning based model-free optimized trajectory tracking strategy design for an AUV
    Duan, Kairong
    Fong, Simon
    Chen, C. L. Philip
    NEUROCOMPUTING, 2022, 469 : 289 - 297
  • [40] Trajectory Tracking Control for Parafoil Systems Based on the Model-Free Adaptive Control Method
    Zhao, Linggong
    He, Weiliang
    Lv, Feikai
    Wang, Xiaoguang
    IEEE ACCESS, 2020, 8 : 152620 - 152636