An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space

被引:0
|
作者
Sharma, Hiteshi [1 ]
Jain, Rahul [1 ]
Gupta, Abhishek [2 ]
机构
[1] Univ Southern Calif, Dept Elect Engn, Los Angeles, CA 90007 USA
[2] Ohio State Univ, Dept Elect Engn, Columbus, OH 43210 USA
关键词
D O I
10.23919/ecc.2019.8795982
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose an empirical relative value learning (ERVL) algorithm for non-parametric MDPs with continuous state space and finite actions and average reward criterion. The ERVL algorithm relies on function approximation via nearest neighbors, and minibatch samples for value function update. It is universal (will work for any MDP), computationally quite simple and yet provides arbitrarily good approximation with high probability in finite time. This is the first such algorithm for non-parametric (and continuous state space) MDPs with average reward criteria with these provable properties as far as we know. Numerical evaluation on a benchmark problem of optimal replacement suggests good performance.
引用
收藏
页码:1368 / 1373
页数:6
相关论文
共 50 条
  • [1] Approximate Relative Value Learning for Average-reward Continuous State MDPs
    Sharma, Hiteshi
    Jafarnia-Jahromi, Mehdi
    Jain, Rahul
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 956 - 964
  • [2] An algorithm for non-parametric estimation in state-space models
    Thi Tuyet Trang Chau
    Ailliot, Pierre
    Monbet, Valerie
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2021, 153
  • [3] An Approximately Optimal Relative Value Learning Algorithm for Averaged MDPs with Continuous States and Actions
    Sharma, Hiteshi
    Jain, Rahul
    2019 57TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2019, : 734 - 740
  • [4] An Empirical Algorithm for Relative Value Iteration for Average-cost MDPs
    Gupta, Abhishek
    Jain, Rahul
    Glynn, Peter W.
    2015 54TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2015, : 5079 - 5084
  • [5] A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs
    Haskell, William B.
    Jain, Rahul
    Sharma, Hiteshi
    Yu, Pengqian
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (01) : 115 - 129
  • [6] Unsupervised non-parametric kernel learning algorithm
    Liu, Bing
    Xia, Shi-Xiong
    Zhou, Yong
    KNOWLEDGE-BASED SYSTEMS, 2013, 44 : 1 - 9
  • [7] NON-PARAMETRIC EMPIRICAL BAYES PROCEDURES
    JOHNS, MV
    ANNALS OF MATHEMATICAL STATISTICS, 1957, 28 (03): : 649 - 669
  • [8] Non-parametric empirical Bayes procedure
    Sarhan, A
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2003, 80 (02) : 115 - 122
  • [9] A non-parametric learning algorithm for small manufacturing data sets
    Li, Der-Chang
    Yeh, Chun-Wu
    EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (01) : 391 - 398
  • [10] Non-parametric manifold learning
    Asta, Dena Marie
    ELECTRONIC JOURNAL OF STATISTICS, 2024, 18 (02): : 3903 - 3930