Gradient-based policy iteration: An example

被引:0
|
作者
Cao, XR [1 ]
Fang, HT [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Elect & Elect Engn, Kowloon, Hong Kong, Peoples R China
关键词
discrete event dynamic systems; potentials; Poisson equations; W-factors; Q-learning; perturbation analysis; Markov decision processes;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent research indicates that perturbation analysis (PA), Markov decision processes (MDP), and reinforcement learning (RL) are three closely-related areas in discrete event dynamic system optimization. In particular, it was shown that policy iteration in fact chooses the policy that has the steepest performance gradient (provided by PA) for the next iteration. This sensitivity point of view of MDP leads to some new research topics. In this note, we propose to implement policy iteration based on performance gradients. This approach is particularly useful when the actions at different states are correlated and hence the standard policy iteration cannot apply. We illustrate the main ideas with an example of M/G/1/N queue and identify some further research topics.
引用
收藏
页码:3367 / 3371
页数:5
相关论文
共 50 条
  • [41] A skeletonization algorithm for gradient-based optimization
    Menten, Martin J.
    Paetzold, Johannes C.
    Zimmer, Veronika A.
    Shit, Suprosanna
    Ezhov, Ivan
    Holland, Robbie
    Probst, Monika
    Schnabel, Julia A.
    Rueckert, Daniel
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 21337 - 21346
  • [42] Categorical Foundations of Gradient-Based Learning
    Cruttwell, Geoffrey S. H.
    Gavranovic, Bruno
    Ghani, Neil
    Wilson, Paul
    Zanasi, Fabio
    PROGRAMMING LANGUAGES AND SYSTEMS, ESOP 2022, 2022, 13240 : 1 - 28
  • [43] ASYMMETRIC GRADIENT-BASED IMAGE ALIGNMENT
    Autheserre, Jean-Baptiste
    Megret, Remi
    Berthoumieu, Yannick
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 981 - 984
  • [44] Quasi-Newton Iteration in Deterministic Policy Gradient
    Kordabad, Arash Bahari
    Esfahani, Hossein Nejatbakhsh
    Cai, Wenqi
    Gros, Sebastien
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 2124 - 2129
  • [45] Gradient-Based Learning of Finite Automata
    del Pozo Romero, Juan Fdez
    Lago-Fernandez, Luis F.
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VIII, 2023, 14261 : 294 - 305
  • [46] Gradient-based Image Quality Assessment
    Bondzulic, Boban
    Petrovic, Vladimir
    Andric, Milenko
    Pavlovic, Boban
    ACTA POLYTECHNICA HUNGARICA, 2018, 15 (04) : 83 - 99
  • [47] A gradient-based direct aperture optimization
    Yang, Jie
    Zhang, Pengcheng
    Zhang, Liyuan
    Gui, Zhiguo
    Shengwu Yixue Gongchengxue Zazhi/Journal of Biomedical Engineering, 2018, 35 (03): : 358 - 367
  • [48] Gradient-based image local features
    Fujiyoshi H.
    Ambai M.
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2011, 77 (12): : 1109 - 1116
  • [49] AN ERROR ANALYSIS OF GRADIENT-BASED METHODS
    LEE, JH
    KIM, SD
    SIGNAL PROCESSING, 1994, 35 (02) : 157 - 162
  • [50] A GRADIENT-BASED METHOD FOR MODULE PLACEMENT
    MIR, M
    IMAM, MH
    COMPUTERS & ELECTRICAL ENGINEERING, 1990, 16 (02) : 109 - 113