An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space

被引:0
|
作者
Sharma, Hiteshi [1 ]
Jain, Rahul [1 ]
Gupta, Abhishek [2 ]
机构
[1] Univ Southern Calif, Dept Elect Engn, Los Angeles, CA 90007 USA
[2] Ohio State Univ, Dept Elect Engn, Columbus, OH 43210 USA
关键词
D O I
10.23919/ecc.2019.8795982
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose an empirical relative value learning (ERVL) algorithm for non-parametric MDPs with continuous state space and finite actions and average reward criterion. The ERVL algorithm relies on function approximation via nearest neighbors, and minibatch samples for value function update. It is universal (will work for any MDP), computationally quite simple and yet provides arbitrarily good approximation with high probability in finite time. This is the first such algorithm for non-parametric (and continuous state space) MDPs with average reward criteria with these provable properties as far as we know. Numerical evaluation on a benchmark problem of optimal replacement suggests good performance.
引用
收藏
页码:1368 / 1373
页数:6
相关论文
共 50 条
  • [41] Non-Parametric Learning for Natural Plan Generation
    Baldwin, Ian
    Newman, Paul
    IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 4311 - 4317
  • [42] Discriminative Non-Parametric Learning of Arithmetic Circuits
    Ramanan, Nandini
    Das, Mayukh
    Kersting, Kristian
    Natarajan, Sriraam
    INTERNATIONAL CONFERENCE ON PROBABILISTIC GRAPHICAL MODELS, VOL 138, 2020, 138 : 353 - 364
  • [43] A Non-parametric Approach for Learning from Crowds
    Fu, Jiayi
    Zhong, Jinhong
    Liu, Yunfeng
    Wang, Zhenyu
    Tang, Ke
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2228 - 2235
  • [44] An Associative State-Space Metric for Learning in Factored MDPs
    Sequeira, Pedro
    Melo, Francisco S.
    Paiva, Ana
    PROGRESS IN ARTIFICIAL INTELLIGENCE, EPIA 2013, 2013, 8154 : 163 - 174
  • [45] APPLICATION OF NON-PARAMETRIC EMPIRICAL BAYES TO TREATMENT OF NON-RESPONSE
    Greenshtein, Eitan
    Itskov, Theodor
    STATISTICA SINICA, 2018, 28 (04) : 2189 - 2208
  • [46] Coordinated learning in multiagent MDPs with infinite state-space
    Melo, Francisco S.
    Ribeiro, M. Isabel
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2010, 21 (03) : 321 - 367
  • [47] Coordinated learning in multiagent MDPs with infinite state-space
    Francisco S. Melo
    M. Isabel Ribeiro
    Autonomous Agents and Multi-Agent Systems, 2010, 21 : 321 - 367
  • [48] Non-parametric estimation of forecast distributions in non-Gaussian, non-linear state space models
    Ng, Jason
    Forbes, Catherine S.
    Martin, Gael M.
    McCabe, Brendan P. M.
    INTERNATIONAL JOURNAL OF FORECASTING, 2013, 29 (03) : 411 - 430
  • [49] Research on Automatic Train Operation Algorithm Based on Non-parametric Iterative Learning Control
    He Z.
    Xu N.
    Tiedao Xuebao/Journal of the China Railway Society, 2020, 42 (12): : 90 - 96
  • [50] Relative efficiency of police directorates in Slovenia: A non-parametric analysis
    Aristovnik, Aleksander
    Seljak, Janko
    Mencinger, Jernej
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (02) : 820 - 827