Gradient-based policy iteration: An example

被引：0

作者：

Cao, XR ^{[1
]}

Fang, HT ^{[1
]}

机构：

[1] Hong Kong Univ Sci & Technol, Dept Elect & Elect Engn, Kowloon, Hong Kong, Peoples R China

来源：

PROCEEDINGS OF THE 41ST IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-4 | 2002年

关键词：

discrete event dynamic systems; potentials; Poisson equations; W-factors; Q-learning; perturbation analysis; Markov decision processes;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent research indicates that perturbation analysis (PA), Markov decision processes (MDP), and reinforcement learning (RL) are three closely-related areas in discrete event dynamic system optimization. In particular, it was shown that policy iteration in fact chooses the policy that has the steepest performance gradient (provided by PA) for the next iteration. This sensitivity point of view of MDP leads to some new research topics. In this note, we propose to implement policy iteration based on performance gradients. This approach is particularly useful when the actions at different states are correlated and hence the standard policy iteration cannot apply. We illustrate the main ideas with an example of M/G/1/N queue and identify some further research topics.

引用

页码：3367 / 3371

页数：5

共 50 条

[1] An analysis of gradient-based policy iteration
Dankert, J
Yang, L
Jennie, S
PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 2977 - 2982
[2] A performance gradient perspective on gradient-based policy iteration and a modified value iteration
Yang, Lei
Dankert, James
Si, Jennie
INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2008, 1 (04) : 509 - 520
[3] Gradient-based iteration for a class of matrix equations
Zhang, Huamin
26TH CHINESE CONTROL AND DECISION CONFERENCE (2014 CCDC), 2014, : 1201 - 1205
[4] Sparse Gradient-Based Direct Policy Search
Sokolovska, Nataliya
NEURAL INFORMATION PROCESSING, ICONIP 2012, PT IV, 2012, 7666 : 212 - 221
[5] Adapting Static and Contextual Representations for Policy Gradient-Based Summarization
Lin, Ching-Sheng
Jwo, Jung-Sing
Lee, Cheng-Hsiung
SENSORS, 2023, 23 (09)
[6] Traffic Light Control with Policy Gradient-Based Reinforcement Learning
Tas, Mehmet Bilge Han
Ozkan, Kemal
Saricicek, Inci
Yazici, Ahmet
32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
[7] Selective multiple power iteration: from tensor PCA to gradient-based exploration of landscapes
Mohamed Ouerfelli
Mohamed Tamaazousti
Vincent Rivasseau
The European Physical Journal Special Topics, 2023, 232 : 3645 - 3660
[8] Two accelerated gradient-based iteration methods for solving the Sylvester matrix equation AX
Wang, Huiling
Wu, Nian-Ci
Nie, Yufeng
AIMS MATHEMATICS, 2024, 9 (12): : 34734 - 34752
[9] Selective multiple power iteration: from tensor PCA to gradient-based exploration of landscapes
Ouerfelli, Mohamed
Tamaazousti, Mohamed
Rivasseau, Vincent
EUROPEAN PHYSICAL JOURNAL-SPECIAL TOPICS, 2023, 232 (23-24): : 3645 - 3660
[10] Optimal preconditioning and iteration complexity bounds for gradient-based optimization in model predictive control
Giselsson, Pontus
2013 AMERICAN CONTROL CONFERENCE (ACC), 2013, : 358 - 364

← 1 2 3 4 5 →