Gradient-based policy iteration: An example

被引:0
|
作者
Cao, XR [1 ]
Fang, HT [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Elect & Elect Engn, Kowloon, Hong Kong, Peoples R China
关键词
discrete event dynamic systems; potentials; Poisson equations; W-factors; Q-learning; perturbation analysis; Markov decision processes;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent research indicates that perturbation analysis (PA), Markov decision processes (MDP), and reinforcement learning (RL) are three closely-related areas in discrete event dynamic system optimization. In particular, it was shown that policy iteration in fact chooses the policy that has the steepest performance gradient (provided by PA) for the next iteration. This sensitivity point of view of MDP leads to some new research topics. In this note, we propose to implement policy iteration based on performance gradients. This approach is particularly useful when the actions at different states are correlated and hence the standard policy iteration cannot apply. We illustrate the main ideas with an example of M/G/1/N queue and identify some further research topics.
引用
收藏
页码:3367 / 3371
页数:5
相关论文
共 50 条
  • [1] An analysis of gradient-based policy iteration
    Dankert, J
    Yang, L
    Jennie, S
    PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 2977 - 2982
  • [2] A performance gradient perspective on gradient-based policy iteration and a modified value iteration
    Yang, Lei
    Dankert, James
    Si, Jennie
    INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2008, 1 (04) : 509 - 520
  • [3] Gradient-based iteration for a class of matrix equations
    Zhang, Huamin
    26TH CHINESE CONTROL AND DECISION CONFERENCE (2014 CCDC), 2014, : 1201 - 1205
  • [4] Sparse Gradient-Based Direct Policy Search
    Sokolovska, Nataliya
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT IV, 2012, 7666 : 212 - 221
  • [5] Adapting Static and Contextual Representations for Policy Gradient-Based Summarization
    Lin, Ching-Sheng
    Jwo, Jung-Sing
    Lee, Cheng-Hsiung
    SENSORS, 2023, 23 (09)
  • [6] Traffic Light Control with Policy Gradient-Based Reinforcement Learning
    Tas, Mehmet Bilge Han
    Ozkan, Kemal
    Saricicek, Inci
    Yazici, Ahmet
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [7] Selective multiple power iteration: from tensor PCA to gradient-based exploration of landscapes
    Mohamed Ouerfelli
    Mohamed Tamaazousti
    Vincent Rivasseau
    The European Physical Journal Special Topics, 2023, 232 : 3645 - 3660
  • [8] Two accelerated gradient-based iteration methods for solving the Sylvester matrix equation AX
    Wang, Huiling
    Wu, Nian-Ci
    Nie, Yufeng
    AIMS MATHEMATICS, 2024, 9 (12): : 34734 - 34752
  • [9] Selective multiple power iteration: from tensor PCA to gradient-based exploration of landscapes
    Ouerfelli, Mohamed
    Tamaazousti, Mohamed
    Rivasseau, Vincent
    EUROPEAN PHYSICAL JOURNAL-SPECIAL TOPICS, 2023, 232 (23-24): : 3645 - 3660
  • [10] Optimal preconditioning and iteration complexity bounds for gradient-based optimization in model predictive control
    Giselsson, Pontus
    2013 AMERICAN CONTROL CONFERENCE (ACC), 2013, : 358 - 364