On-Line Policy Gradient Estimation with Multi-Step Sampling

被引:2
|
作者
Li, Yan-Jie [1 ]
Cao, Fang [1 ]
Cao, Xi-Ren [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Kowloon, Hong Kong, Peoples R China
关键词
Markov reward processes; Policy gradient; On-line estimation; Performance potentials; SENSITIVITY-ANALYSIS; INFINITE-HORIZON; POTENTIALS;
D O I
10.1007/s10626-009-0078-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this note, we discuss the problem of the sample-path-based (on-line) performance gradient estimation for Markov systems. The existing on-line performance gradient estimation algorithms generally require a standard importance sampling assumption. When the assumption does not hold, these algorithms may lead to poor estimates for the gradients. We show that this assumption can be relaxed and propose algorithms with multi-step sampling for performance gradient estimates; these algorithms do not require the standard assumption. Simulation examples are given to illustrate the accuracy of the estimates.
引用
收藏
页码:3 / 17
页数:15
相关论文
共 50 条
  • [1] On-Line Policy Gradient Estimation with Multi-Step Sampling
    Yan-Jie Li
    Fang Cao
    Xi-Ren Cao
    [J]. Discrete Event Dynamic Systems, 2010, 20 : 3 - 17
  • [2] On-line MS detection for a multi-step combinatorial synthesis system
    Sakai, R
    Takahashi, Y
    Sakamoto, K
    Yoshida, Y
    Kitamori, T
    [J]. Micro Total Analysis Systems 2004, Vol 1, 2005, (296): : 96 - 98
  • [3] An Attack Graph-based On-line Multi-step Attack Detector
    Angelini, Marco
    Bonomi, Silvia
    Borzi, Emanuele
    Del Pozzo, Antonella
    Lenti, Simone
    Santucci, Giuseppe
    [J]. ICDCN'18: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, 2018,
  • [4] Multi-step estimation for forecasting
    Clements, MP
    Hendry, DF
    [J]. OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 1996, 58 (04) : 657 - +
  • [5] On multi-step estimation of delay for SDE
    Kutoyants, Yury A.
    [J]. BERNOULLI, 2021, 27 (03) : 2069 - 2090
  • [6] Multi-Step Gradient Methods for Networked Optimization
    Ghadimi, Euhanna
    Shames, Iman
    Johansson, Mikael
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (21) : 5417 - 5429
  • [7] Direct multi-step estimation and forecasting
    Chevillon, Guillaume
    [J]. JOURNAL OF ECONOMIC SURVEYS, 2007, 21 (04) : 746 - 785
  • [8] Multi-step Knowledge-Aided Iterative Conjugate Gradient Algorithms for DOA Estimation
    Pinto, Silvio F. B.
    de Lamare, Rodrigo C.
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (08) : 3841 - 3859
  • [9] Multi-step Knowledge-Aided Iterative Conjugate Gradient Algorithms for DOA Estimation
    Silvio F. B. Pinto
    Rodrigo C. de Lamare
    [J]. Circuits, Systems, and Signal Processing, 2019, 38 : 3841 - 3859
  • [10] Guaranteed Estimation Problem for Multi-Step Systems
    Ananyev, Boris I.
    Yurovskikh, Polina A.
    [J]. BULLETIN OF IRKUTSK STATE UNIVERSITY-SERIES MATHEMATICS, 2023, 45 : 37 - 53