Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control

被引:3
|
作者
Lin, Ziyu [1 ]
Duan, Jingliang [2 ]
Li, Shengbo Eben [1 ]
Ma, Haitong [1 ]
Li, Jie [1 ]
Chen, Jianyu [3 ]
Cheng, Bo [1 ]
Ma, Jun [4 ,5 ]
机构
[1] Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China
[2] Univ Sci & Technol Beijing, Sch Mech Engn, Beijing 100083, Peoples R China
[3] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing 100084, Peoples R China
[4] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst Thrust, Guangzhou, Peoples R China
[5] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
关键词
Mathematical models; Heuristic algorithms; Optimal control; Artificial neural networks; Approximation algorithms; Dynamic programming; Nonlinear dynamical systems; Actor critic; approximate dynamic programming (ADP); finite-horizon (FH); Hamilton-Jacobi-Bellman (HJB) equation; optimal control; policy iteration (PI); SYSTEMS;
D O I
10.1109/TNNLS.2022.3225090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Hamilton-Jacobi-Bellman (HJB) equation serves as the necessary and sufficient condition for the optimal solution to the continuous-time (CT) optimal control problem (OCP). Compared with the infinite-horizon HJB equation, the solving of the finite-horizon (FH) HJB equation has been a long-standing challenge, because the partial time derivative of the value function is involved as an additional unknown term. To address this problem, this study first-time bridges the link between the partial time derivative and the terminal-time utility function, and thus it facilitates the use of the policy iteration (PI) technique to solve the CT FH OCPs. Based on this key finding, the FH approximate dynamic programming (ADP) algorithm is proposed leveraging an actor-critic framework. It is shown that the algorithm exhibits important properties in terms of convergence and optimality. Rather importantly, with the use of multilayer neural networks (NNs) in the actor-critic architecture, the algorithm is suitable for CT FH OCPs toward more general nonlinear and complex systems. Finally, the effectiveness of the proposed algorithm is demonstrated by conducting a series of simulations on both a linear quadratic regulator (LQR) problem and a nonlinear vehicle tracking problem.
引用
收藏
页码:5255 / 5267
页数:13
相关论文
共 50 条
  • [21] Linear-Like Policy Iteration Based Optimal Control for Continuous-Time Nonlinear Systems
    Tahirovic, Adnan
    Astolfi, Alessandro
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (10) : 5837 - 5849
  • [22] Optimal Control for Continuous-time Nonlinear Systems based on a Linear-like Policy Iteration
    Tahirovic, Adnan
    Astolfi, Alessandro
    [J]. 2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5238 - 5243
  • [23] Approximate finite-horizon optimal control without PDEs
    Sassano, M.
    Astolfi, A.
    [J]. Systems and Control Letters, 2013, 62 (02): : 97 - 103
  • [24] On approximate policy iteration for continuous-time systems
    Wernrud, Andreas
    Rantzer, Anders
    [J]. 2005 44th IEEE Conference on Decision and Control & European Control Conference, Vols 1-8, 2005, : 1453 - 1458
  • [25] Approximate finite-horizon optimal control without PDEs
    Sassano, M.
    Astolfi, A.
    [J]. SYSTEMS & CONTROL LETTERS, 2013, 62 (02) : 97 - 103
  • [26] A NEW ITERATION APPROACH TO SOLVE A CLASS OF FINITE-HORIZON CONTINUOUS-TIME NONAFFINE NONLINEAR ZERO-SUM GAME
    Zhang, Xin
    Zhang, Huaguang
    Wang, Xingyuan
    Luo, Yanhong
    [J]. INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2011, 7 (02): : 597 - 608
  • [27] Adaptive optimal control for continuous-time linear systems based on policy iteration
    Vrabie, D.
    Pastravanu, O.
    Abu-Khalaf, M.
    Lewis, F. L.
    [J]. AUTOMATICA, 2009, 45 (02) : 477 - 484
  • [28] Adaptive dynamic programming for terminally constrained finite-horizon optimal control problems
    Andrews, L.
    Klotz, J. R.
    Kamalapurkar, R.
    Dixon, W. E.
    [J]. 2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 5095 - 5100
  • [29] Finite approximation for finite-horizon continuous-time Markov decision processes
    Qingda Wei
    [J]. 4OR, 2017, 15 : 67 - 84
  • [30] Approximate Closed-Form Solutions to Finite-Horizon Optimal Control of Nonlinear Systems
    Heydari, Ali
    Balakrishnan, S. N.
    [J]. 2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 2657 - 2662