Empirical Policy Iteration for Approximate Dynamic Programming

被引：0

作者：

Haskell, William B. ^{[1
]}

Jain, Rahul ^{[1
]}

Kalathil, Dileep ^{[1
]}

机构：

[1] Univ So Calif, Dept Elect Engn, Los Angeles, CA 90089 USA

来源：

2014 IEEE 53RD ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC) | 2014年

关键词：

MARKOV DECISION-PROCESSES; CONVERGENCE;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a simulation based algorithm, Empirical Policy Iteration (EPI) algorithm, for finding the optimal policy function of an MDP with infinite horizon discounted cost criteria when the transition kernels are unknown. Unlike simulation based algorithms using stochastic approximation techniques which give only asymptotic convergence results, we give provable, non-asymptotic performance guarantees in terms of sample complexity results: given is an element of > 0 and delta > 0 we specify the minimum number of simulation samples n(is an element of, delta) needed in each iteration and the minimum number of iterations k(is an element of, delta) that are sufficient for the EPI to yield, with a probability at least 1-delta, an approximate value function that is at least is an element of close to the optimal value function.

引用

页码：6573 / 6578

页数：6

共 50 条

[1] Empirical Value Iteration for Approximate Dynamic Programming
Haskell, William B.
Jain, Rahul
Kalathil, Dileep
[J]. 2014 AMERICAN CONTROL CONFERENCE (ACC), 2014, : 495 - 500
[2] Policy Iteration Approximate Dynamic Programming Using Volterra Series Based Actor
Guo, Wentao
Si, Jennie
Liu, Feng
Mei, Shengwei
[J]. PROCEEDINGS OF THE 2014 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2014, : 249 - 255
[3] Policy Approximation in Policy Iteration Approximate Dynamic Programming for Discrete-Time Nonlinear Systems
Guo, Wentao
Si, Jennie
Liu, Feng
Mei, Shengwei
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (07) : 2794 - 2807
[4] Policy Iteration Based Approximate Dynamic Programming Toward Autonomous Driving in Constrained Dynamic Environment
Lin, Ziyu
Ma, Jun
Duan, Jingliang
Li, Shengbo Eben
Ma, Haitong
Cheng, Bo
Lee, Tong Heng
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (05) : 5003 - 5013
[5] AN EFFICIENT POLICY ITERATION ALGORITHM FOR DYNAMIC PROGRAMMING EQUATIONS
Alla, Alessandro
Falcone, Maurizio
Kalise, Dante
[J]. SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2015, 37 (01): : A181 - A200
[6] CONVERGENCE OF POLICY ITERATION IN CONTRACTING DYNAMIC-PROGRAMMING
PUTERMAN, ML
[J]. ADVANCES IN APPLIED PROBABILITY, 1978, 10 (02) : 312 - 312
[7] Approximate dynamic programming for an inventory problem: Empirical comparison
Katanyukul, Tatpong
Duff, William S.
Chong, Edwin K. P.
[J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2011, 60 (04) : 719 - 743
[8] Adaptive Approximate Policy Iteration
Hao, Botao
Lazic, Nevena
Abbasi-Yadkori, Yasin
Joulani, Pooria
Szepesvari, Csaba
[J]. 24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 523 - 531
[9] REDUCED COMPLEXITY DYNAMIC-PROGRAMMING BASED ON POLICY ITERATION
BAYARD, DS
[J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1992, 170 (01) : 75 - 103
[10] Error Bound Analysis of Policy Iteration Based Approximate Dynamic Programming for Deterministic Discrete-time Nonlinear Systems
Guo, Wentao
Liu, Feng
Si, Jennie
Mei, Shengwei
Li, Rui
[J]. 2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,

← 1 2 3 4 5 →