On-Line Policy Gradient Estimation with Multi-Step Sampling

被引:2
|
作者
Li, Yan-Jie [1 ]
Cao, Fang [1 ]
Cao, Xi-Ren [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Kowloon, Hong Kong, Peoples R China
关键词
Markov reward processes; Policy gradient; On-line estimation; Performance potentials; SENSITIVITY-ANALYSIS; INFINITE-HORIZON; POTENTIALS;
D O I
10.1007/s10626-009-0078-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this note, we discuss the problem of the sample-path-based (on-line) performance gradient estimation for Markov systems. The existing on-line performance gradient estimation algorithms generally require a standard importance sampling assumption. When the assumption does not hold, these algorithms may lead to poor estimates for the gradients. We show that this assumption can be relaxed and propose algorithms with multi-step sampling for performance gradient estimates; these algorithms do not require the standard assumption. Simulation examples are given to illustrate the accuracy of the estimates.
引用
收藏
页码:3 / 17
页数:15
相关论文
共 50 条
  • [41] Multi-Step Delayed Input and State Estimation: A System Augmentation Approach
    Hsieh, Chien-Shu
    [J]. 2017 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND INTELLIGENT CONTROLS (ISCSIC), 2017, : 63 - 68
  • [42] Multi-robot On-line Sampling Scheduler for Persistent Monitoring
    Macharet, Douglas G.
    Alves Neto, Armando
    [J]. 2019 19TH INTERNATIONAL CONFERENCE ON ADVANCED ROBOTICS (ICAR), 2019, : 617 - 622
  • [43] Multi-step planning in the brain
    Miller, Kevin J.
    Venditto, Sarah Jo C.
    [J]. CURRENT OPINION IN BEHAVIORAL SCIENCES, 2021, 38 : 29 - 39
  • [44] A MULTI-STEP SECTOR PHOTOMETER
    VANDENAKKER, JA
    [J]. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA, 1946, 36 (10) : 561 - 568
  • [45] Multi-Step Classification Trees
    Chang, Youngjae
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2012, 41 (09) : 1728 - 1744
  • [46] Multi-Step Counting ADC
    Payandehnia, Pedram
    Meng, Xin
    Temes, Gabor C.
    [J]. 2014 IEEE 57TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2014, : 17 - 20
  • [47] Industrial multi-step biotransformations
    Panke, S
    Kümmel, A
    Schümperli, M
    Heinemann, M
    [J]. CHIMICA OGGI-CHEMISTRY TODAY, 2004, 22 (09) : 44 - 47
  • [48] What is a 'step' in a multi-step pathogenesis of leukemia?
    Jankovic, GM
    Colovic, MD
    Bogdanovic, AD
    Vukanic, D
    Andolina, M
    Anagnostopoulos, A
    [J]. LEUKEMIA RESEARCH, 1996, 20 (06) : 531 - 532
  • [49] Slow sampling on-line harmonics/interharmonics estimation technique for smart meters
    Sadinezhad, Iman
    Agelidis, Vassilios G.
    [J]. ELECTRIC POWER SYSTEMS RESEARCH, 2011, 81 (08) : 1643 - 1653
  • [50] Inverting the planning gradient: adjustment of grasps to late segments of multi-step object manipulations
    Hanna Mathew
    Wilfried Kunde
    Oliver Herbort
    [J]. Experimental Brain Research, 2017, 235 : 1397 - 1409