Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control

被引:3
|
作者
Lin, Ziyu [1 ]
Duan, Jingliang [2 ]
Li, Shengbo Eben [1 ]
Ma, Haitong [1 ]
Li, Jie [1 ]
Chen, Jianyu [3 ]
Cheng, Bo [1 ]
Ma, Jun [4 ,5 ]
机构
[1] Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China
[2] Univ Sci & Technol Beijing, Sch Mech Engn, Beijing 100083, Peoples R China
[3] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing 100084, Peoples R China
[4] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst Thrust, Guangzhou, Peoples R China
[5] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
关键词
Mathematical models; Heuristic algorithms; Optimal control; Artificial neural networks; Approximation algorithms; Dynamic programming; Nonlinear dynamical systems; Actor critic; approximate dynamic programming (ADP); finite-horizon (FH); Hamilton-Jacobi-Bellman (HJB) equation; optimal control; policy iteration (PI); SYSTEMS;
D O I
10.1109/TNNLS.2022.3225090
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Hamilton-Jacobi-Bellman (HJB) equation serves as the necessary and sufficient condition for the optimal solution to the continuous-time (CT) optimal control problem (OCP). Compared with the infinite-horizon HJB equation, the solving of the finite-horizon (FH) HJB equation has been a long-standing challenge, because the partial time derivative of the value function is involved as an additional unknown term. To address this problem, this study first-time bridges the link between the partial time derivative and the terminal-time utility function, and thus it facilitates the use of the policy iteration (PI) technique to solve the CT FH OCPs. Based on this key finding, the FH approximate dynamic programming (ADP) algorithm is proposed leveraging an actor-critic framework. It is shown that the algorithm exhibits important properties in terms of convergence and optimality. Rather importantly, with the use of multilayer neural networks (NNs) in the actor-critic architecture, the algorithm is suitable for CT FH OCPs toward more general nonlinear and complex systems. Finally, the effectiveness of the proposed algorithm is demonstrated by conducting a series of simulations on both a linear quadratic regulator (LQR) problem and a nonlinear vehicle tracking problem.
引用
收藏
页码:5255 / 5267
页数:13
相关论文
共 50 条
  • [1] Approximate Finite-horizon Optimal Control with Policy Iteration
    Zhao Zhengen
    Yang Ying
    Li Hao
    Liu Dan
    [J]. 2014 33RD CHINESE CONTROL CONFERENCE (CCC), 2014, : 8889 - 8894
  • [2] Solving Finite-Horizon HJB for Optimal Control of Continuous-Time Systems
    Lin, Ziyu
    Duan, Jingliang
    Li, Shengbo Eben
    Li, Jie
    Ma, Haitong
    Sun, Qi
    Chen, Jianyu
    Cheng, Bo
    [J]. 2021 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL AND ROBOTICS (ICCCR 2021), 2021, : 116 - 122
  • [3] Policy-Iteration-Based Adaptive Optimal Control for Uncertain Continuous-Time Linear Systems with Excitation Signals
    Lee, Jae Young
    Park, Jin Bae
    Choi, Yoon Ho
    [J]. INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS 2010), 2010, : 646 - 651
  • [4] Finite-horizon optimal control for continuous-time uncertain nonlinear systems using reinforcement learning
    Zhao, Jingang
    Gan, Minggang
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2020, 51 (13) : 2429 - 2440
  • [5] Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design
    Luo, Biao
    Wu, Huai-Ning
    Huang, Tingwen
    Liu, Derong
    [J]. AUTOMATICA, 2014, 50 (12) : 3281 - 3290
  • [6] Finite-Horizon Optimal Tracking Guidance for Aircraft Based on Approximate Dynamic Programming
    Wan, Shizheng
    Chang, Xiaofei
    Li, Quancheng
    Yan, Jie
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2019, 2019
  • [7] Finite-Horizon Neural Network-based Optimal Control Design for Affine Nonlinear Continuous-time Systems
    Zhao, Qiming
    Xu, Hao
    Dierks, Travis
    Jagannathan, S.
    [J]. 2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [8] Approximate Dynamic Programming with Gaussian Processes for Optimal Control of Continuous-Time Nonlinear Systems
    Beppu, Hirofumi
    Maruta, Ichiro
    Fujimoto, Kenji
    [J]. IFAC PAPERSONLINE, 2020, 53 (02): : 6715 - 6722
  • [9] Neural Network-based Finite-Horizon Approximately Optimal Control of Uncertain Affine Nonlinear Continuous-time Systems
    Xu, Hao
    Zhao, Qiming
    Dierks, Travis
    Jagannathan, S.
    [J]. 2014 AMERICAN CONTROL CONFERENCE (ACC), 2014, : 1243 - 1248
  • [10] Adaptive Dynamic Programming for Finite-Horizon Optimal Tracking Control of a Class of Nonlinear Systems
    Wang Ding
    Liu Derong
    Wei Qinglai
    [J]. 2011 30TH CHINESE CONTROL CONFERENCE (CCC), 2011, : 2450 - 2455