Policy Iteration Approximate Dynamic Programming Using Volterra Series Based Actor

被引:0
|
作者
Guo, Wentao [1 ]
Si, Jennie [2 ]
Liu, Feng [1 ]
Mei, Shengwei [1 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, State Key Lab Power Syst, Beijing 100084, Peoples R China
[2] Arizona State Univ, Dept Elect Engn, Tempe, AZ 85287 USA
关键词
TIME NONLINEAR-SYSTEMS; ADAPTIVE CRITIC DESIGNS; ONLINE LEARNING CONTROL; FEEDBACK-CONTROL; NEURAL-NETWORKS; REINFORCEMENT; IDENTIFICATION; ALGORITHM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
There is an extensive literature on value function approximation for approximate dynamic programming (ADP). Multilayer perceptrons (MLPs) and radial basis functions (RBFs), among others, are typical approximators for value functions in ADP. Similar approaches have been taken for policy approximation. In this paper, we propose a new Volterra series based structure for actor approximation in ADP. The Volterra approximator is linear in parameters with global optima attainable. Given the proposed approximator structures, we further develop a policy iteration framework under which a gradient descent training algorithm for obtaining the optimal Volterra kernels can be obtained. Associated with this ADP design, we provide a sufficient condition based on actor approximation error to guarantee convergence of the value function iterations. A finite bound of the final convergent value function is also given. Finally, by using a simulation example we illustrate the effectiveness of the proposed Volterra actor for optimal control of a nonlinear system.
引用
收藏
页码:249 / 255
页数:7
相关论文
共 50 条
  • [1] Empirical Policy Iteration for Approximate Dynamic Programming
    Haskell, William B.
    Jain, Rahul
    Kalathil, Dileep
    [J]. 2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2014, : 6573 - 6578
  • [2] Policy Iteration Based Approximate Dynamic Programming Toward Autonomous Driving in Constrained Dynamic Environment
    Lin, Ziyu
    Ma, Jun
    Duan, Jingliang
    Li, Shengbo Eben
    Ma, Haitong
    Cheng, Bo
    Lee, Tong Heng
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (05) : 5003 - 5013
  • [3] Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems
    Guo, Wentao
    Si, Jennie
    Liu, Feng
    Mei, Shengwei
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (07) : 2794 - 2807
  • [4] Empirical Value Iteration for Approximate Dynamic Programming
    Haskell, William B.
    Jain, Rahul
    Kalathil, Dileep
    [J]. 2014 AMERICAN CONTROL CONFERENCE (ACC), 2014, : 495 - 500
  • [5] REDUCED COMPLEXITY DYNAMIC-PROGRAMMING BASED ON POLICY ITERATION
    BAYARD, DS
    [J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1992, 170 (01) : 75 - 103
  • [6] Error Bound Analysis of Policy Iteration Based Approximate Dynamic Programming for Deterministic Discrete-time Nonlinear Systems
    Guo, Wentao
    Liu, Feng
    Si, Jennie
    Mei, Shengwei
    Li, Rui
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [7] Policy-Iteration-Based Finite-Horizon Approximate Dynamic Programming for Continuous-Time Nonlinear Optimal Control
    Lin, Ziyu
    Duan, Jingliang
    Li, Shengbo Eben
    Ma, Haitong
    Li, Jie
    Chen, Jianyu
    Cheng, Bo
    Ma, Jun
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 5255 - 5267
  • [8] Vision-Based Reinforcement Learning using Approximate Policy Iteration
    Shaker, Marwan R.
    Yue, Shigang
    Duckett, Tom
    [J]. ICAR: 2009 14TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS, VOLS 1 AND 2, 2009, : 594 - 599
  • [9] Classification-Based Approximate Policy Iteration
    Farahmand, Amir-massoud
    Precup, Doina
    Barreto, Andre M. S.
    Ghavamzadeh, Mohammad
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2015, 60 (11) : 2989 - 2993
  • [10] AN EFFICIENT POLICY ITERATION ALGORITHM FOR DYNAMIC PROGRAMMING EQUATIONS
    Alla, Alessandro
    Falcone, Maurizio
    Kalise, Dante
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01): : A181 - A200