Empirical Policy Iteration for Approximate Dynamic Programming

被引:0
|
作者
Haskell, William B. [1 ]
Jain, Rahul [1 ]
Kalathil, Dileep [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Los Angeles, CA 90089 USA
关键词
MARKOV DECISION-PROCESSES; CONVERGENCE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given is an element of > 0 and delta > 0 we specify the minimum number of simulation samples n(is an element of, delta) needed in each iteration and the minimum number of iterations k(is an element of, delta) that are sufficient for the EPI to yield, with a probability at least 1-delta, an approximate value function that is at least is an element of close to the optimal value function.
引用
收藏
页码:6573 / 6578
页数:6
相关论文
共 50 条
  • [1] Empirical Value Iteration for Approximate Dynamic Programming
    Haskell, William B.
    Jain, Rahul
    Kalathil, Dileep
    [J]. 2014 AMERICAN CONTROL CONFERENCE (ACC), 2014, : 495 - 500
  • [2] Policy Iteration Approximate Dynamic Programming Using Volterra Series Based Actor
    Guo, Wentao
    Si, Jennie
    Liu, Feng
    Mei, Shengwei
    [J]. PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 249 - 255
  • [3] Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems
    Guo, Wentao
    Si, Jennie
    Liu, Feng
    Mei, Shengwei
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (07) : 2794 - 2807
  • [4] Policy Iteration Based Approximate Dynamic Programming Toward Autonomous Driving in Constrained Dynamic Environment
    Lin, Ziyu
    Ma, Jun
    Duan, Jingliang
    Li, Shengbo Eben
    Ma, Haitong
    Cheng, Bo
    Lee, Tong Heng
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (05) : 5003 - 5013
  • [5] AN EFFICIENT POLICY ITERATION ALGORITHM FOR DYNAMIC PROGRAMMING EQUATIONS
    Alla, Alessandro
    Falcone, Maurizio
    Kalise, Dante
    [J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01): : A181 - A200
  • [6] CONVERGENCE OF POLICY ITERATION IN CONTRACTING DYNAMIC-PROGRAMMING
    PUTERMAN, ML
    [J]. ADVANCES IN APPLIED PROBABILITY, 1978, 10 (02) : 312 - 312
  • [7] Approximate dynamic programming for an inventory problem: Empirical comparison
    Katanyukul, Tatpong
    Duff, William S.
    Chong, Edwin K. P.
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2011, 60 (04) : 719 - 743
  • [8] Adaptive Approximate Policy Iteration
    Hao, Botao
    Lazic, Nevena
    Abbasi-Yadkori, Yasin
    Joulani, Pooria
    Szepesvari, Csaba
    [J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 523 - 531
  • [9] REDUCED COMPLEXITY DYNAMIC-PROGRAMMING BASED ON POLICY ITERATION
    BAYARD, DS
    [J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1992, 170 (01) : 75 - 103
  • [10] Error Bound Analysis of Policy Iteration Based Approximate Dynamic Programming for Deterministic Discrete-time Nonlinear Systems
    Guo, Wentao
    Liu, Feng
    Si, Jennie
    Mei, Shengwei
    Li, Rui
    [J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,