On-Line Policy Gradient Estimation with Multi-Step Sampling

被引：2

作者：

Li, Yan-Jie ^{[1
]}

Cao, Fang ^{[1
]}

Cao, Xi-Ren ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Kowloon, Hong Kong, Peoples R China

来源：

DISCRETE EVENT DYNAMIC SYSTEMS-THEORY AND APPLICATIONS | 2010年 / 20卷 / 01期

关键词：

Markov reward processes; Policy gradient; On-line estimation; Performance potentials; SENSITIVITY-ANALYSIS; INFINITE-HORIZON; POTENTIALS;

D O I：

10.1007/s10626-009-0078-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this note, we discuss the problem of the sample-path-based (on-line) performance gradient estimation for Markov systems. The existing on-line performance gradient estimation algorithms generally require a standard importance sampling assumption. When the assumption does not hold, these algorithms may lead to poor estimates for the gradients. We show that this assumption can be relaxed and propose algorithms with multi-step sampling for performance gradient estimates; these algorithms do not require the standard assumption. Simulation examples are given to illustrate the accuracy of the estimates.

引用

页码：3 / 17

页数：15

共 50 条

[1] On-Line Policy Gradient Estimation with Multi-Step Sampling
Yan-Jie Li
Fang Cao
Xi-Ren Cao
[J]. Discrete Event Dynamic Systems, 2010, 20 : 3 - 17
[2] On-line MS detection for a multi-step combinatorial synthesis system
Sakai, R
Takahashi, Y
Sakamoto, K
Yoshida, Y
Kitamori, T
[J]. Micro Total Analysis Systems 2004, Vol 1, 2005, (296): : 96 - 98
[3] An Attack Graph-based On-line Multi-step Attack Detector
Angelini, Marco
Bonomi, Silvia
Borzi, Emanuele
Del Pozzo, Antonella
Lenti, Simone
Santucci, Giuseppe
[J]. ICDCN'18: PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, 2018,
[4] Multi-step estimation for forecasting
Clements, MP
Hendry, DF
[J]. OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 1996, 58 (04) : 657 - +
[5] On multi-step estimation of delay for SDE
Kutoyants, Yury A.
[J]. BERNOULLI, 2021, 27 (03) : 2069 - 2090
[6] Multi-Step Gradient Methods for Networked Optimization
Ghadimi, Euhanna
Shames, Iman
Johansson, Mikael
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (21) : 5417 - 5429
[7] Direct multi-step estimation and forecasting
Chevillon, Guillaume
[J]. JOURNAL OF ECONOMIC SURVEYS, 2007, 21 (04) : 746 - 785
[8] Multi-step Knowledge-Aided Iterative Conjugate Gradient Algorithms for DOA Estimation
Pinto, Silvio F. B.
de Lamare, Rodrigo C.
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (08) : 3841 - 3859
[9] Multi-step Knowledge-Aided Iterative Conjugate Gradient Algorithms for DOA Estimation
Silvio F. B. Pinto
Rodrigo C. de Lamare
[J]. Circuits, Systems, and Signal Processing, 2019, 38 : 3841 - 3859
[10] Guaranteed Estimation Problem for Multi-Step Systems
Ananyev, Boris I.
Yurovskikh, Polina A.
[J]. BULLETIN OF IRKUTSK STATE UNIVERSITY-SERIES MATHEMATICS, 2023, 45 : 37 - 53

← 1 2 3 4 5 →