An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space

被引：0

作者：

Sharma, Hiteshi ^{[1
]}

Jain, Rahul ^{[1
]}

Gupta, Abhishek ^{[2
]}

机构：

[1] Univ Southern Calif, Dept Elect Engn, Los Angeles, CA 90007 USA

[2] Ohio State Univ, Dept Elect Engn, Columbus, OH 43210 USA

来源：

2019 18TH EUROPEAN CONTROL CONFERENCE (ECC) | 2019年

关键词：

D O I：

10.23919/ecc.2019.8795982

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose an empirical relative value learning (ERVL) algorithm for non-parametric MDPs with continuous state space and finite actions and average reward criterion. The ERVL algorithm relies on function approximation via nearest neighbors, and minibatch samples for value function update. It is universal (will work for any MDP), computationally quite simple and yet provides arbitrarily good approximation with high probability in finite time. This is the first such algorithm for non-parametric (and continuous state space) MDPs with average reward criteria with these provable properties as far as we know. Numerical evaluation on a benchmark problem of optimal replacement suggests good performance.

引用

页码：1368 / 1373

页数：6

共 50 条

[1] Approximate Relative Value Learning for Average-reward Continuous State MDPs
Sharma, Hiteshi
Jafarnia-Jahromi, Mehdi
Jain, Rahul
35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 956 - 964
[2] An algorithm for non-parametric estimation in state-space models
Thi Tuyet Trang Chau
Ailliot, Pierre
Monbet, Valerie
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 153
[3] An Approximately Optimal Relative Value Learning Algorithm for Averaged MDPs with Continuous States and Actions
Sharma, Hiteshi
Jain, Rahul
2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2019, : 734 - 740
[4] An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs
Gupta, Abhishek
Jain, Rahul
Glynn, Peter W.
2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2015, : 5079 - 5084
[5] A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs
Haskell, William B.
Jain, Rahul
Sharma, Hiteshi
Yu, Pengqian
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (01) : 115 - 129
[6] Unsupervised non-parametric kernel learning algorithm
Liu, Bing
Xia, Shi-Xiong
Zhou, Yong
KNOWLEDGE-BASED SYSTEMS, 2013, 44 : 1 - 9
[7] NON-PARAMETRIC EMPIRICAL BAYES PROCEDURES
JOHNS, MV
ANNALS OF MATHEMATICAL STATISTICS, 1957, 28 (03): : 649 - 669
[8] Non-parametric empirical Bayes procedure
Sarhan, A
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2003, 80 (02) : 115 - 122
[9] A non-parametric learning algorithm for small manufacturing data sets
Li, Der-Chang
Yeh, Chun-Wu
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (01) : 391 - 398
[10] Non-parametric manifold learning
Asta, Dena Marie
ELECTRONIC JOURNAL OF STATISTICS, 2024, 18 (02): : 3903 - 3930

← 1 2 3 4 5 →