Gradient-based policy iteration: An example

被引:0
|
作者
Cao, XR [1 ]
Fang, HT [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Elect & Elect Engn, Kowloon, Hong Kong, Peoples R China
关键词
discrete event dynamic systems; potentials; Poisson equations; W-factors; Q-learning; perturbation analysis; Markov decision processes;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent research indicates that perturbation analysis (PA), Markov decision processes (MDP), and reinforcement learning (RL) are three closely-related areas in discrete event dynamic system optimization. In particular, it was shown that policy iteration in fact chooses the policy that has the steepest performance gradient (provided by PA) for the next iteration. This sensitivity point of view of MDP leads to some new research topics. In this note, we propose to implement policy iteration based on performance gradients. This approach is particularly useful when the actions at different states are correlated and hence the standard policy iteration cannot apply. We illustrate the main ideas with an example of M/G/1/N queue and identify some further research topics.
引用
收藏
页码:3367 / 3371
页数:5
相关论文
共 50 条
  • [31] A multiagent deep deterministic policy gradient-based distributed protection method for distribution network
    Zeng, Peng
    Cui, Shijie
    Song, Chunhe
    Wang, Zhongfeng
    Li, Guangye
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (03): : 2267 - 2278
  • [32] A multiagent deep deterministic policy gradient-based distributed protection method for distribution network
    Peng Zeng
    Shijie Cui
    Chunhe Song
    Zhongfeng Wang
    Guangye Li
    Neural Computing and Applications, 2023, 35 : 2267 - 2278
  • [33] A GRADIENT-BASED METHOD FOR TEAM EVASION
    Liu, Shih-Yuan
    Zhou, Zhengyuan
    Tomlin, Claire
    Hedrick, Karl
    ASME 2013 DYNAMIC SYSTEMS AND CONTROL CONFERENCE, VOL. 3, 2013,
  • [34] The Gradient-Based Cache Partitioning Algorithm
    Hasenplaugh, William
    Ahuja, Pritpal S.
    Jaleel, Aamer
    Steely, Simon, Jr.
    Emer, Joel
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2012, 8 (04)
  • [35] Robust Gradient-Based Markov Subsampling
    Gong, Tieliang
    Xi, Quanhan
    Xu, Chen
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4004 - 4011
  • [36] Gradient-Based Competitive Learning: Theory
    Giansalvo Cirrincione
    Vincenzo Randazzo
    Pietro Barbiero
    Gabriele Ciravegna
    Eros Pasero
    Cognitive Computation, 2024, 16 : 608 - 623
  • [37] Average Gradient-Based Adversarial Attack
    Wan, Chen
    Huang, Fangjun
    Zhao, Xianfeng
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 9572 - 9585
  • [38] Gradient-Based Competitive Learning: Theory
    Cirrincione, Giansalvo
    Randazzo, Vincenzo
    Barbiero, Pietro
    Ciravegna, Gabriele
    Pasero, Eros
    COGNITIVE COMPUTATION, 2024, 16 (02) : 608 - 623
  • [39] Gradient-based adaptive importance samplers
    Elvira, Victor
    Chouzenoux, Emilie
    Akyildiz, Omer Deniz
    Martino, Luca
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2023, 360 (13): : 9490 - 9514
  • [40] GRADIENT-BASED BLOCK TRUNCATION CODING
    QUWEIDER, MK
    SALARI, E
    ELECTRONICS LETTERS, 1995, 31 (05) : 353 - 355